Add Chapter 13: Device & Offline Deployment documentation

- Add mobile deployment guide for Android & HarmonyOS (BotOS) - Add hardware guide for SBCs (Raspberry Pi, Orange Pi, etc.) - Add quick start guide for 5-minute deployment - Add local LLM guide with llama.cpp for offline AI - Update SUMMARY.md to place chapter after Security (Part XII) - Include bloatware removal, Magisk module, GSI instructions - Cover NPU acceleration on Orange Pi 5 with rkllm
2025-12-12 14:00:38 -03:00 · 2025-12-12 14:00:38 -03:00 · d3bc28fac6
commit d3bc28fac6
parent ff5d2ac12c
10 changed files with 1175 additions and 840 deletions
--- a/src/13-devices/README.md
+++ b/src/13-devices/README.md
@ -0,0 +1,54 @@
 # Chapter 13: Device & Offline Deployment
 Deploy General Bots to any device - from smartphones to Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities.
 ## Overview
 General Bots can run on any device, from mobile phones to minimal embedded hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere:
 - **Kiosks** - Self-service terminals in stores, airports, hospitals
 - **Industrial IoT** - Factory floor assistants, machine interfaces
 - **Smart Home** - Wall panels, kitchen displays, door intercoms
 - **Retail** - Point-of-sale systems, product information terminals
 - **Education** - Classroom assistants, lab equipment interfaces
 - **Healthcare** - Patient check-in, medication reminders
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                         Embedded GB Architecture                             │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                              │
 │    ┌──────────────┐     ┌──────────────┐     ┌──────────────┐              │
 │    │   Display    │     │  botserver   │     │  llama.cpp   │              │
 │    │  LCD/OLED    │────▶│   (Rust)     │────▶│   (Local)    │              │
 │    │   TFT/HDMI   │     │  Port 8088   │     │  Port 8080   │              │
 │    └──────────────┘     └──────────────┘     └──────────────┘              │
 │           │                    │                    │                       │
 │           │                    │                    │                       │
 │    ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐              │
 │    │  Keyboard   │     │   SQLite    │     │  TinyLlama  │              │
 │    │  Buttons    │     │   (Data)    │     │    GGUF     │              │
 │    │  Touch      │     │             │     │   (~700MB)  │              │
 │    └─────────────┘     └─────────────┘     └─────────────┘              │
 │                                                                              │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ## What's in This Chapter
 ### Mobile Deployment
 - [Mobile (Android & HarmonyOS)](./mobile.md) - BotOS for smartphones and tablets
 ### Embedded Deployment  
 - [Supported Hardware](./hardware.md) - SBCs, displays, and peripherals
 - [Quick Start](./quick-start.md) - Deploy in 5 minutes
 - [Local LLM](./local-llm.md) - Offline AI with llama.cpp
 ### Deployment Options
 | Platform | Use Case | Requirements |
 |----------|----------|--------------|
 | **Android/HarmonyOS** | Smartphones, tablets, kiosks | Any Android 8+ device |
 | **Raspberry Pi** | IoT, displays, terminals | 1GB+ RAM |
 | **Orange Pi** | Full offline AI | 4GB+ RAM for LLM |
 | **Industrial** | Factory, retail, healthcare | Any ARM/x86 SBC |
--- a/src/13-devices/hardware.md
+++ b/src/13-devices/hardware.md
@ -0,0 +1,190 @@
 # Supported Hardware
 ## Single Board Computers (SBCs)
 ### Recommended Boards
 | Board | CPU | RAM | Best For | Price |
 |-------|-----|-----|----------|-------|
 | **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 |
 | **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 |
 | **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 |
 | **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 |
 | **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 |
 | **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 |
 | **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 |
 | **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 |
 | **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 |
 | **ODROID-N2+** | S922X | 4GB | High performance | $79 |
 ### Minimum Requirements
 **For UI only (connect to remote botserver):**
 - Any ARM/x86 Linux board
 - 256MB RAM
 - Network connection
 - Display output
 **For local botserver:**
 - ARM64 or x86_64
 - 1GB RAM minimum
 - 4GB storage
 **For local LLM (llama.cpp):**
 - ARM64 or x86_64
 - 2GB+ RAM (4GB recommended)
 - 2GB+ storage for model
 ### Orange Pi 5 (Recommended for LLM)
 The Orange Pi 5 with RK3588S is ideal for embedded LLM:
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │  Orange Pi 5 - Best for Offline AI                          │
 ├─────────────────────────────────────────────────────────────┤
 │  CPU: Rockchip RK3588S (4x A76 + 4x A55)                   │
 │  NPU: 6 TOPS (Neural Processing Unit)                       │
 │  GPU: Mali-G610 MP4                                         │
 │  RAM: 4GB / 8GB / 16GB LPDDR4X                             │
 │  Storage: M.2 NVMe + eMMC + microSD                        │
 │                                                             │
 │  LLM Performance:                                           │
 │  ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec                    │
 │  ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec                         │
 │  └─ With NPU (rkllm): ~20-30 tokens/sec                    │
 └─────────────────────────────────────────────────────────────┘
 ```
 ## Displays
 ### Character LCDs (Minimal)
 For text-only interfaces:
 | Display | Resolution | Interface | Use Case |
 |---------|------------|-----------|----------|
 | HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A |
 | HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context |
 | LCD2004 | 20 chars × 4 lines | I2C | Industrial |
 **Example output on 16x2:**
 ```
 ┌────────────────┐
 │> How can I help│
 │< Processing... │
 └────────────────┘
 ```
 ### OLED Displays
 For graphical monochrome interfaces:
 | Display | Resolution | Interface | Size |
 |---------|------------|-----------|------|
 | SSD1306 | 128×64 | I2C/SPI | 0.96" |
 | SSD1309 | 128×64 | I2C/SPI | 2.42" |
 | SH1106 | 128×64 | I2C/SPI | 1.3" |
 | SSD1322 | 256×64 | SPI | 3.12" |
 ### TFT/IPS Color Displays
 For full graphical interface:
 | Display | Resolution | Interface | Notes |
 |---------|------------|-----------|-------|
 | ILI9341 | 320×240 | SPI | Common, cheap |
 | ST7789 | 240×320 | SPI | Fast refresh |
 | ILI9488 | 480×320 | SPI | Larger |
 | Waveshare 5" | 800×480 | HDMI | Touch optional |
 | Waveshare 7" | 1024×600 | HDMI | Touch, IPS |
 | Official Pi 7" | 800×480 | DSI | Best for Pi |
 ### E-Ink/E-Paper
 For low-power, readable in sunlight:
 | Display | Resolution | Colors | Refresh |
 |---------|------------|--------|---------|
 | Waveshare 2.13" | 250×122 | B/W | 2s |
 | Waveshare 4.2" | 400×300 | B/W | 4s |
 | Waveshare 7.5" | 800×480 | B/W | 5s |
 | Good Display 9.7" | 1200×825 | B/W | 6s |
 **Best for:** Menu displays, signs, low-update applications
 ### Industrial Displays
 | Display | Resolution | Features |
 |---------|------------|----------|
 | Advantech | Various | Wide temp, sunlight |
 | Winstar | Various | Industrial grade |
 | Newhaven | Various | Long availability |
 ## Input Devices
 ### Keyboards
 - **USB Keyboard** - Standard, any USB keyboard works
 - **PS/2 Keyboard** - Via adapter, lower latency
 - **Matrix Keypad** - 4x4 or 3x4, GPIO connected
 - **I2C Keypad** - Fewer GPIO pins needed
 ### Touch Input
 - **Capacitive Touch** - Better response, needs driver
 - **Resistive Touch** - Works with gloves, pressure-based
 - **IR Touch Frame** - Large displays, vandal-resistant
 ### Buttons & GPIO
 ```
 ┌─────────────────────────────────────────────┐
 │  Simple 4-Button Interface                   │
 ├─────────────────────────────────────────────┤
 │                                              │
 │   [◄ PREV]  [▲ UP]  [▼ DOWN]  [► SELECT]   │
 │                                              │
 │   GPIO 17   GPIO 27  GPIO 22   GPIO 23      │
 │                                              │
 └─────────────────────────────────────────────┘
 ```
 ## Enclosures
 ### Commercial Options
 - **Hammond Manufacturing** - Industrial metal enclosures
 - **Polycase** - Plastic, IP65 rated
 - **Bud Industries** - Various sizes
 - **Pi-specific cases** - Argon, Flirc, etc.
 ### DIY Options
 - **3D Printed** - Custom fit, PLA/PETG
 - **Laser Cut** - Acrylic, wood
 - **Metal Fabrication** - Professional look
 ## Power
 ### Power Requirements
 | Configuration | Power | Recommended PSU |
 |---------------|-------|-----------------|
 | Pi Zero + LCD | 1-2W | 5V 1A |
 | Pi 4 + Display | 5-10W | 5V 3A |
 | Orange Pi 5 | 8-15W | 5V 4A or 12V 2A |
 | With NVMe SSD | +2-3W | Add 1A headroom |
 ### Power Options
 - **USB-C PD** - Modern, efficient
 - **PoE HAT** - Power over Ethernet
 - **12V Barrel** - Industrial standard
 - **Battery** - UPS, solar applications
 ### UPS Solutions
 - **PiJuice** - Pi-specific UPS HAT
 - **UPS PIco** - Small form factor
 - **Powerboost** - Adafruit, lithium battery
--- a/src/13-devices/local-llm.md
+++ b/src/13-devices/local-llm.md
@ -0,0 +1,382 @@
 # Local LLM - Offline AI with llama.cpp
 Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy.
 ## Overview
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                        Local LLM Architecture                                │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                              │
 │   User Input ──▶ botserver ──▶ llama.cpp ──▶ Response                       │
 │                      │              │                                        │
 │                      │         ┌────┴────┐                                   │
 │                      │         │  Model  │                                   │
 │                      │         │  GGUF   │                                   │
 │                      │         │ (Q4_K)  │                                   │
 │                      │         └─────────┘                                   │
 │                      │                                                       │
 │                 SQLite DB                                                    │
 │                (sessions)                                                    │
 │                                                                              │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ## Recommended Models
 ### By Device RAM
 | RAM | Model | Size | Speed | Quality |
 |-----|-------|------|-------|---------|
 | **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic |
 | **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good |
 | **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good |
 | **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better |
 | **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great |
 | **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent |
 ### By Use Case
 **Simple Q&A, Commands:**
 ```
 TinyLlama 1.1B - Fast, basic understanding
 ```
 **Customer Service, FAQ:**
 ```
 Phi-2 or Gemma 2B - Good comprehension, reasonable speed
 ```
 **Complex Reasoning:**
 ```
 Llama 3.2 3B or Mistral 7B - Better accuracy, slower
 ```
 ## Installation
 ### Automatic (via deploy script)
 ```bash
 ./scripts/deploy-embedded.sh pi@device --with-llama
 ```
 ### Manual Installation
 ```bash
 # SSH to device
 ssh pi@raspberrypi.local
 # Install dependencies
 sudo apt update
 sudo apt install -y build-essential cmake git wget
 # Clone llama.cpp
 cd /opt
 sudo git clone https://github.com/ggerganov/llama.cpp
 sudo chown -R $(whoami):$(whoami) llama.cpp
 cd llama.cpp
 # Build for ARM (auto-optimizes)
 mkdir build && cd build
 cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release
 make -j$(nproc)
 # Download model
 mkdir -p /opt/llama.cpp/models
 cd /opt/llama.cpp/models
 wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
 ```
 ### Start Server
 ```bash
 # Test run
 /opt/llama.cpp/build/bin/llama-server \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -c 2048 \
    --threads 4
 # Verify
 curl http://localhost:8080/v1/models
 ```
 ### Systemd Service
 Create `/etc/systemd/system/llama-server.service`:
 ```ini
 [Unit]
 Description=llama.cpp Server - Local LLM
 After=network.target
 [Service]
 Type=simple
 User=root
 WorkingDirectory=/opt/llama.cpp
 ExecStart=/opt/llama.cpp/build/bin/llama-server \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -c 2048 \
    -ngl 0 \
    --threads 4
 Restart=always
 RestartSec=5
 [Install]
 WantedBy=multi-user.target
 ```
 Enable and start:
 ```bash
 sudo systemctl daemon-reload
 sudo systemctl enable llama-server
 sudo systemctl start llama-server
 ```
 ## Configuration
 ### botserver .env
 ```env
 # Use local llama.cpp
 LLM_PROVIDER=llamacpp
 LLM_API_URL=http://127.0.0.1:8080
 LLM_MODEL=tinyllama
 # Memory limits
 MAX_CONTEXT_TOKENS=2048
 MAX_RESPONSE_TOKENS=512
 STREAMING_ENABLED=true
 ```
 ### llama.cpp Parameters
 | Parameter | Default | Description |
 |-----------|---------|-------------|
 | `-c` | 2048 | Context size (tokens) |
 | `--threads` | 4 | CPU threads |
 | `-ngl` | 0 | GPU layers (0 for CPU only) |
 | `--host` | 127.0.0.1 | Bind address |
 | `--port` | 8080 | Server port |
 | `-b` | 512 | Batch size |
 | `--mlock` | off | Lock model in RAM |
 ### Memory vs Context Size
 ```
 Context 512:  ~400MB RAM, fast, limited conversation
 Context 1024: ~600MB RAM, moderate
 Context 2048: ~900MB RAM, good for most uses
 Context 4096: ~1.5GB RAM, long conversations
 ```
 ## Performance Optimization
 ### CPU Optimization
 ```bash
 # Check CPU features
 cat /proc/cpuinfo | grep -E "(model name|Features)"
 # Build with specific optimizations
 cmake .. -DLLAMA_NATIVE=ON \
         -DCMAKE_BUILD_TYPE=Release \
         -DLLAMA_ARM_FMA=ON \
         -DLLAMA_ARM_DOTPROD=ON
 ```
 ### Memory Optimization
 ```bash
 # For 2GB RAM devices
 # Use smaller context
 -c 1024
 # Use memory mapping (slower but less RAM)
 --mmap
 # Disable mlock (don't pin to RAM)
 # (default is disabled)
 ```
 ### Swap Configuration
 For devices with limited RAM:
 ```bash
 # Create 2GB swap
 sudo fallocate -l 2G /swapfile
 sudo chmod 600 /swapfile
 sudo mkswap /swapfile
 sudo swapon /swapfile
 # Make permanent
 echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
 # Optimize swap usage
 echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
 ```
 ## NPU Acceleration (Orange Pi 5)
 Orange Pi 5 has a 6 TOPS NPU that can accelerate inference:
 ### Using rkllm (Rockchip NPU)
 ```bash
 # Install rkllm runtime
 git clone https://github.com/airockchip/rknn-llm
 cd rknn-llm
 ./install.sh
 # Convert model to RKNN format
 python3 convert_model.py \
    --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --output tinyllama.rkllm
 # Run with NPU
 rkllm-server \
    --model tinyllama.rkllm \
    --port 8080
 ```
 Expected speedup: **3-5x faster** than CPU only.
 ## Model Download URLs
 ### TinyLlama 1.1B (Recommended for 2GB)
 ```bash
 wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
 ```
 ### Phi-2 2.7B (Recommended for 4GB)
 ```bash
 wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf
 ```
 ### Gemma 2B
 ```bash
 wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf
 ```
 ### Llama 3.2 3B (Recommended for 8GB)
 ```bash
 wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
 ```
 ### Mistral 7B
 ```bash
 wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
 ```
 ## API Usage
 llama.cpp exposes an OpenAI-compatible API:
 ### Chat Completion
 ```bash
 curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "max_tokens": 100
  }'
 ```
 ### Streaming
 ```bash
 curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'
 ```
 ### Health Check
 ```bash
 curl http://localhost:8080/health
 curl http://localhost:8080/v1/models
 ```
 ## Monitoring
 ### Check Performance
 ```bash
 # Watch resource usage
 htop
 # Check inference speed in logs
 sudo journalctl -u llama-server -f | grep "tokens/s"
 # Memory usage
 free -h
 ```
 ### Benchmarking
 ```bash
 # Run llama.cpp benchmark
 /opt/llama.cpp/build/bin/llama-bench \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    -p 512 -n 128 -t 4
 ```
 ## Troubleshooting
 ### Model Loading Fails
 ```bash
 # Check available RAM
 free -h
 # Try smaller context
 -c 512
 # Use memory mapping
 --mmap
 ```
 ### Slow Inference
 ```bash
 # Increase threads (up to CPU cores)
 --threads $(nproc)
 # Use optimized build
 cmake .. -DLLAMA_NATIVE=ON
 # Consider smaller model
 ```
 ### Out of Memory Killer
 ```bash
 # Check if OOM killed the process
 dmesg | grep -i "killed process"
 # Increase swap
 # Use smaller model
 # Reduce context size
 ```
 ## Best Practices
 1. **Start small** - Begin with TinyLlama, upgrade if needed
 2. **Monitor memory** - Use `htop` during initial tests
 3. **Set appropriate context** - 1024-2048 for most embedded use
 4. **Use quantized models** - Q4_K_M is a good balance
 5. **Enable streaming** - Better UX on slow inference
 6. **Test offline** - Verify it works without internet before deployment
--- a/src/13-devices/mobile.md
+++ b/src/13-devices/mobile.md
@ -0,0 +1,323 @@
 # Mobile Deployment - Android & HarmonyOS
 Deploy General Bots as the primary interface on Android and HarmonyOS devices, transforming them into dedicated AI assistants.
 ## Overview
 BotOS transforms any Android or HarmonyOS device into a dedicated General Bots system, removing manufacturer bloatware and installing GB as the default launcher.
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                         BotOS Architecture                                   │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                              │
 │    ┌──────────────────────────────────────────────────────────────────┐    │
 │    │                      BotOS App (Tauri)                            │    │
 │    ├──────────────────────────────────────────────────────────────────┤    │
 │    │  botui/ui/suite   │  Tauri Android   │  src/lib.rs (Rust)        │    │
 │    │  (Web Interface)  │  (WebView + NDK) │  (Backend + Hardware)     │    │
 │    └──────────────────────────────────────────────────────────────────┘    │
 │                              │                                              │
 │    ┌─────────────────────────┴────────────────────────────┐                │
 │    │              Android/HarmonyOS System                 │                │
 │    │  ┌─────────┐  ┌──────────┐  ┌────────┐  ┌─────────┐ │                │
 │    │  │ Camera  │  │   GPS    │  │  WiFi  │  │ Storage │ │                │
 │    │  └─────────┘  └──────────┘  └────────┘  └─────────┘ │                │
 │    └───────────────────────────────────────────────────────┘                │
 │                                                                              │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ## Supported Platforms
 ### Android
 - **AOSP** - Pure Android
 - **Samsung One UI** - Galaxy devices
 - **Xiaomi MIUI** - Mi, Redmi, Poco
 - **OPPO ColorOS** - OPPO, OnePlus, Realme
 - **Vivo Funtouch/OriginOS**
 - **Google Pixel**
 ### HarmonyOS
 - **Huawei** - P series, Mate series, Nova
 - **Honor** - Magic series, X series
 ## Installation Levels
 | Level | Requirements | What It Does |
 |-------|-------------|--------------|
 | **Level 1** | ADB only | Removes bloatware, installs BotOS as app |
 | **Level 2** | Root + Magisk | GB boot animation, BotOS as system app |
 | **Level 3** | Unlocked bootloader | Full Android replacement with BotOS |
 ## Quick Installation
 ### Level 1: Debloat + App (No Root)
 ```bash
 # Clone botos repository
 git clone https://github.com/GeneralBots/botos.git
 cd botos/rom
 # Connect device via USB (enable USB debugging first)
 ./install.sh
 ```
 The interactive installer will:
 1. Detect your device and manufacturer
 2. Remove bloatware automatically
 3. Install BotOS APK
 4. Optionally set as default launcher
 ### Level 2: Magisk Module (Root Required)
 ```bash
 # Generate Magisk module
 cd botos/rom/scripts
 ./build-magisk-module.sh
 # Copy to device
 adb push botos-magisk-v1.0.zip /sdcard/
 # Install via Magisk app
 # Magisk → Modules → + → Select ZIP → Reboot
 ```
 This adds:
 - Custom boot animation
 - BotOS as system app (privileged permissions)
 - Debloat via overlay
 ### Level 3: GSI (Full Replacement)
 For advanced users with unlocked bootloader. See `botos/rom/gsi/README.md`.
 ## Bloatware Removed
 ### Samsung One UI
 - Bixby, Samsung Pay, Samsung Pass
 - Duplicate apps (Email, Calendar, Browser)
 - AR Zone, Game Launcher
 - Samsung Free, Samsung Global Goals
 ### Huawei EMUI/HarmonyOS
 - AppGallery, HiCloud, HiCar
 - Huawei Browser, Music, Video
 - Petal Maps, Petal Search
 - AI Life, HiSuite
 ### Honor MagicOS
 - Honor Store, MagicRing
 - Honor Browser, Music
 ### Xiaomi MIUI
 - MSA (analytics), Mi Apps
 - GetApps, Mi Cloud
 - Mi Browser, Mi Music
 ### Universal (All Devices)
 - Pre-installed Facebook, Instagram
 - Pre-installed Netflix, Spotify
 - Games like Candy Crush
 - Carrier bloatware
 ## Building from Source
 ### Prerequisites
 ```bash
 # Install Rust and Android targets
 rustup target add aarch64-linux-android armv7-linux-androideabi
 # Set up Android SDK/NDK
 export ANDROID_HOME=$HOME/Android/Sdk
 export NDK_HOME=$ANDROID_HOME/ndk/25.2.9519653
 # Install Tauri CLI
 cargo install tauri-cli
 # For icons/boot animation
 sudo apt install librsvg2-bin imagemagick
 ```
 ### Build APK
 ```bash
 cd botos
 # Generate icons from SVG
 ./scripts/generate-icons.sh
 # Initialize Android project
 cargo tauri android init
 # Build release APK
 cargo tauri android build --release
 ```
 Output: `gen/android/app/build/outputs/apk/release/app-release.apk`
 ### Development Mode
 ```bash
 # Connect device and run
 cargo tauri android dev
 # Watch logs
 adb logcat -s BotOS:*
 ```
 ## Configuration
 ### AndroidManifest.xml
 BotOS is configured as a launcher:
 ```xml
 <intent-filter>
    <action android:name="android.intent.action.MAIN" />
    <category android:name="android.intent.category.HOME" />
    <category android:name="android.intent.category.DEFAULT" />
    <category android:name="android.intent.category.LAUNCHER" />
 </intent-filter>
 ```
 ### Permissions
 Default capabilities in `capabilities/default.json`:
 - Internet access
 - Camera (for QR codes, photos)
 - Location (GPS)
 - Storage (files)
 - Notifications
 ### Connecting to Server
 Edit the embedded URL in `tauri.conf.json`:
 ```json
 {
  "build": {
    "frontendDist": "../botui/ui/suite"
  }
 }
 ```
 Or configure botserver URL at runtime:
 ```javascript
 window.BOTSERVER_URL = "https://your-server.com";
 ```
 ## Boot Animation
 Create custom boot animation with GB branding:
 ```bash
 # Generate animation
 cd botos/scripts
 ./create-bootanimation.sh
 # Install (requires root)
 adb root
 adb remount
 adb push bootanimation.zip /system/media/
 adb reboot
 ```
 ## Project Structure
 ```
 botos/
 ├── Cargo.toml              # Rust/Tauri dependencies
 ├── tauri.conf.json         # Tauri config → botui/ui/suite
 ├── build.rs                # Build script
 ├── src/lib.rs              # Android entry point
 │
 ├── icons/
 │   ├── gb-bot.svg          # Source icon
 │   ├── icon.png            # Main icon (512x512)
 │   └── */ic_launcher.png   # Icons by density
 │
 ├── scripts/
 │   ├── generate-icons.sh   # Generate PNGs from SVG
 │   └── create-bootanimation.sh
 │
 ├── capabilities/
 │   └── default.json        # Tauri permissions
 │
 ├── gen/android/            # Generated Android project
 │   └── app/src/main/
 │       ├── AndroidManifest.xml
 │       └── res/values/themes.xml
 │
 └── rom/                    # Installation tools
    ├── install.sh          # Interactive installer
    ├── scripts/
    │   ├── debloat.sh      # Remove bloatware
    │   └── build-magisk-module.sh
    └── gsi/
        └── README.md       # GSI instructions
 ```
 ## Offline Mode
 BotOS can work offline with local LLM:
 1. Install botserver on the device (see [Local LLM](./local-llm.md))
 2. Configure to use localhost:
   ```javascript
   window.BOTSERVER_URL = "http://127.0.0.1:8088";
   ```
 3. Run llama.cpp with small model (TinyLlama on 4GB+ devices)
 ## Use Cases
 ### Dedicated Kiosk
 - Retail product information
 - Hotel check-in
 - Restaurant ordering
 - Museum guides
 ### Enterprise Device
 - Field service assistant
 - Warehouse scanner with AI
 - Delivery driver companion
 - Healthcare bedside terminal
 ### Consumer Device
 - Elder-friendly phone
 - Child-safe device
 - Single-purpose assistant
 - Smart home controller
 ## Troubleshooting
 ### App Won't Install
 ```bash
 # Enable installation from unknown sources
 # Settings → Security → Unknown Sources
 # Or use ADB
 adb install -r botos.apk
 ```
 ### Debloat Not Working
 ```bash
 # Some packages require root
 # Use Level 2 (Magisk) for complete removal
 # Check which packages failed
 adb shell pm list packages | grep <manufacturer>
 ```
 ### Boot Loop After GSI
 ```bash
 # Boot into recovery
 # Wipe data/factory reset
 # Reflash stock ROM
 ```
 ### WebView Crashes
 ```bash
 # Update Android System WebView
 adb shell pm enable com.google.android.webview
--- a/src/13-devices/quick-start.md
+++ b/src/13-devices/quick-start.md
@ -0,0 +1,209 @@
 # Quick Start - Deploy in 5 Minutes
 Get General Bots running on your embedded device with local AI in just a few commands.
 ## Prerequisites
 - An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian
 - SSH access to the device
 - Internet connection (for initial setup only)
 ## One-Line Deploy
 From your development machine:
 ```bash
 # Clone and run the deployment script
 git clone https://github.com/GeneralBots/botserver.git
 cd botserver
 # Deploy to Orange Pi (replace with your device IP)
 ./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama
 ```
 That's it! After ~10-15 minutes:
 - BotServer runs on port 8088
 - llama.cpp runs on port 8080 with TinyLlama
 - Embedded UI available at `http://your-device:8088/embedded/`
 ## Step-by-Step Guide
 ### Step 1: Prepare Your Device
 Flash your SBC with a compatible OS:
 **Raspberry Pi:**
 ```bash
 # Download Raspberry Pi Imager
 # Select: Raspberry Pi OS Lite (64-bit)
 # Enable SSH in settings
 ```
 **Orange Pi:**
 ```bash
 # Download Armbian from armbian.com
 # Flash with balenaEtcher
 ```
 ### Step 2: First Boot Configuration
 ```bash
 # SSH into your device
 ssh pi@raspberrypi.local  # or orangepi@orangepi.local
 # Update system
 sudo apt update && sudo apt upgrade -y
 # Set timezone
 sudo timedatectl set-timezone America/Sao_Paulo
 # Enable I2C/SPI if using GPIO displays
 sudo raspi-config  # or armbian-config
 ```
 ### Step 3: Run Deployment Script
 From your development PC:
 ```bash
 # Basic deployment (botserver only)
 ./scripts/deploy-embedded.sh pi@raspberrypi.local
 # With embedded UI
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui
 # With local LLM (requires 4GB+ RAM)
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama
 # Specify a different model
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf
 ```
 ### Step 4: Verify Installation
 ```bash
 # Check services
 ssh pi@raspberrypi.local 'sudo systemctl status botserver'
 ssh pi@raspberrypi.local 'sudo systemctl status llama-server'
 # Test botserver
 curl http://raspberrypi.local:8088/health
 # Test llama.cpp
 curl http://raspberrypi.local:8080/v1/models
 ```
 ### Step 5: Access the Interface
 Open in your browser:
 ```
 http://raspberrypi.local:8088/embedded/
 ```
 Or set up kiosk mode (auto-starts on boot):
 ```bash
 # Already configured if you used --with-ui
 # Just reboot:
 ssh pi@raspberrypi.local 'sudo reboot'
 ```
 ## Local Installation (On the Device)
 If you prefer to install directly on the device:
 ```bash
 # SSH into the device
 ssh pi@raspberrypi.local
 # Install Rust
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 source ~/.cargo/env
 # Clone and build
 git clone https://github.com/GeneralBots/botserver.git
 cd botserver
 # Run local deployment
 ./scripts/deploy-embedded.sh --local --with-ui --with-llama
 ```
 ⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster.
 ## Configuration
 After deployment, edit the config file:
 ```bash
 ssh pi@raspberrypi.local
 sudo nano /opt/botserver/.env
 ```
 Key settings:
 ```env
 # Server
 HOST=0.0.0.0
 PORT=8088
 # Local LLM
 LLM_PROVIDER=llamacpp
 LLM_API_URL=http://127.0.0.1:8080
 LLM_MODEL=tinyllama
 # Memory limits for small devices
 MAX_CONTEXT_TOKENS=2048
 MAX_RESPONSE_TOKENS=512
 ```
 Restart after changes:
 ```bash
 sudo systemctl restart botserver
 ```
 ## Troubleshooting
 ### Out of Memory
 ```bash
 # Check memory usage
 free -h
 # Reduce llama.cpp context
 sudo nano /etc/systemd/system/llama-server.service
 # Change -c 2048 to -c 1024
 # Or use a smaller model
 # TinyLlama uses ~700MB, Phi-2 uses ~1.6GB
 ```
 ### Service Won't Start
 ```bash
 # Check logs
 sudo journalctl -u botserver -f
 sudo journalctl -u llama-server -f
 # Common issues:
 # - Port already in use
 # - Missing model file
 # - Database permissions
 ```
 ### Display Not Working
 ```bash
 # Check if display is detected
 ls /dev/fb*       # HDMI/DSI
 ls /dev/i2c*      # I2C displays
 ls /dev/spidev*   # SPI displays
 # For HDMI, check config
 sudo nano /boot/config.txt  # Raspberry Pi
 sudo nano /boot/armbianEnv.txt  # Orange Pi
 ```
 ## Next Steps
 - [Embedded UI Guide](./embedded-ui.md) - Customize the interface
 - [Local LLM Configuration](./local-llm.md) - Optimize AI performance
 - [Kiosk Mode](./kiosk-mode.md) - Production deployment
 - [Offline Operation](./offline.md) - Disconnected environments
--- a/src/20-embedding/README.md
+++ b/src/20-embedding/README.md
@ -1,47 +1 @@
-# Chapter 20: Embedded & Offline Deployment
+# Chapter 20: Embedded Deployment
 Deploy General Bots to any device - from Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities.
 ## Overview
 General Bots can run on minimal hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere:
 - **Kiosks** - Self-service terminals in stores, airports, hospitals
 - **Industrial IoT** - Factory floor assistants, machine interfaces
 - **Smart Home** - Wall panels, kitchen displays, door intercoms
 - **Retail** - Point-of-sale systems, product information terminals
 - **Education** - Classroom assistants, lab equipment interfaces
 - **Healthcare** - Patient check-in, medication reminders
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                         Embedded GB Architecture                             │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                              │
 │    ┌──────────────┐     ┌──────────────┐     ┌──────────────┐              │
 │    │   Display    │     │  botserver   │     │  llama.cpp   │              │
 │    │  LCD/OLED    │────▶│   (Rust)     │────▶│   (Local)    │              │
 │    │   TFT/HDMI   │     │  Port 8088   │     │  Port 8080   │              │
 │    └──────────────┘     └──────────────┘     └──────────────┘              │
 │           │                    │                    │                       │
 │           │                    │                    │                       │
 │    ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐              │
 │    │  Keyboard   │     │   SQLite    │     │  TinyLlama  │              │
 │    │  Buttons    │     │   (Data)    │     │    GGUF     │              │
 │    │  Touch      │     │             │     │   (~700MB)  │              │
 │    └─────────────┘     └─────────────┘     └─────────────┘              │
 │                                                                              │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ## What's in This Chapter
 - [Supported Hardware](./hardware.md) - Boards, displays, and peripherals
 - [Quick Start](./quick-start.md) - Deploy in 5 minutes
 - [Embedded UI](./embedded-ui.md) - Interface for small displays
 - [Local LLM](./local-llm.md) - Offline AI with llama.cpp
 - [Display Modes](./display-modes.md) - LCD, OLED, TFT, E-ink configurations
 - [Kiosk Mode](./kiosk-mode.md) - Locked-down production deployments
 - [Performance Tuning](./performance.md) - Optimize for limited resources
 - [Offline Operation](./offline.md) - No internet required
 - [Use Cases](./use-cases.md) - Real-world deployment examples
--- a/src/20-embedding/hardware.md
+++ b/src/20-embedding/hardware.md
@ -1,190 +1 @@
 # Supported Hardware
 ## Single Board Computers (SBCs)
 ### Recommended Boards
 | Board | CPU | RAM | Best For | Price |
 |-------|-----|-----|----------|-------|
 | **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 |
 | **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 |
 | **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 |
 | **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 |
 | **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 |
 | **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 |
 | **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 |
 | **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 |
 | **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 |
 | **ODROID-N2+** | S922X | 4GB | High performance | $79 |
 ### Minimum Requirements
 **For UI only (connect to remote botserver):**
 - Any ARM/x86 Linux board
 - 256MB RAM
 - Network connection
 - Display output
 **For local botserver:**
 - ARM64 or x86_64
 - 1GB RAM minimum
 - 4GB storage
 **For local LLM (llama.cpp):**
 - ARM64 or x86_64
 - 2GB+ RAM (4GB recommended)
 - 2GB+ storage for model
 ### Orange Pi 5 (Recommended for LLM)
 The Orange Pi 5 with RK3588S is ideal for embedded LLM:
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │  Orange Pi 5 - Best for Offline AI                          │
 ├─────────────────────────────────────────────────────────────┤
 │  CPU: Rockchip RK3588S (4x A76 + 4x A55)                   │
 │  NPU: 6 TOPS (Neural Processing Unit)                       │
 │  GPU: Mali-G610 MP4                                         │
 │  RAM: 4GB / 8GB / 16GB LPDDR4X                             │
 │  Storage: M.2 NVMe + eMMC + microSD                        │
 │                                                             │
 │  LLM Performance:                                           │
 │  ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec                    │
 │  ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec                         │
 │  └─ With NPU (rkllm): ~20-30 tokens/sec                    │
 └─────────────────────────────────────────────────────────────┘
 ```
 ## Displays
 ### Character LCDs (Minimal)
 For text-only interfaces:
 | Display | Resolution | Interface | Use Case |
 |---------|------------|-----------|----------|
 | HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A |
 | HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context |
 | LCD2004 | 20 chars × 4 lines | I2C | Industrial |
 **Example output on 16x2:**
 ```
 ┌────────────────┐
 │> How can I help│
 │< Processing... │
 └────────────────┘
 ```
 ### OLED Displays
 For graphical monochrome interfaces:
 | Display | Resolution | Interface | Size |
 |---------|------------|-----------|------|
 | SSD1306 | 128×64 | I2C/SPI | 0.96" |
 | SSD1309 | 128×64 | I2C/SPI | 2.42" |
 | SH1106 | 128×64 | I2C/SPI | 1.3" |
 | SSD1322 | 256×64 | SPI | 3.12" |
 ### TFT/IPS Color Displays
 For full graphical interface:
 | Display | Resolution | Interface | Notes |
 |---------|------------|-----------|-------|
 | ILI9341 | 320×240 | SPI | Common, cheap |
 | ST7789 | 240×320 | SPI | Fast refresh |
 | ILI9488 | 480×320 | SPI | Larger |
 | Waveshare 5" | 800×480 | HDMI | Touch optional |
 | Waveshare 7" | 1024×600 | HDMI | Touch, IPS |
 | Official Pi 7" | 800×480 | DSI | Best for Pi |
 ### E-Ink/E-Paper
 For low-power, readable in sunlight:
 | Display | Resolution | Colors | Refresh |
 |---------|------------|--------|---------|
 | Waveshare 2.13" | 250×122 | B/W | 2s |
 | Waveshare 4.2" | 400×300 | B/W | 4s |
 | Waveshare 7.5" | 800×480 | B/W | 5s |
 | Good Display 9.7" | 1200×825 | B/W | 6s |
 **Best for:** Menu displays, signs, low-update applications
 ### Industrial Displays
 | Display | Resolution | Features |
 |---------|------------|----------|
 | Advantech | Various | Wide temp, sunlight |
 | Winstar | Various | Industrial grade |
 | Newhaven | Various | Long availability |
 ## Input Devices
 ### Keyboards
 - **USB Keyboard** - Standard, any USB keyboard works
 - **PS/2 Keyboard** - Via adapter, lower latency
 - **Matrix Keypad** - 4x4 or 3x4, GPIO connected
 - **I2C Keypad** - Fewer GPIO pins needed
 ### Touch Input
 - **Capacitive Touch** - Better response, needs driver
 - **Resistive Touch** - Works with gloves, pressure-based
 - **IR Touch Frame** - Large displays, vandal-resistant
 ### Buttons & GPIO
 ```
 ┌─────────────────────────────────────────────┐
 │  Simple 4-Button Interface                   │
 ├─────────────────────────────────────────────┤
 │                                              │
 │   [◄ PREV]  [▲ UP]  [▼ DOWN]  [► SELECT]   │
 │                                              │
 │   GPIO 17   GPIO 27  GPIO 22   GPIO 23      │
 │                                              │
 └─────────────────────────────────────────────┘
 ```
 ## Enclosures
 ### Commercial Options
 - **Hammond Manufacturing** - Industrial metal enclosures
 - **Polycase** - Plastic, IP65 rated
 - **Bud Industries** - Various sizes
 - **Pi-specific cases** - Argon, Flirc, etc.
 ### DIY Options
 - **3D Printed** - Custom fit, PLA/PETG
 - **Laser Cut** - Acrylic, wood
 - **Metal Fabrication** - Professional look
 ## Power
 ### Power Requirements
 | Configuration | Power | Recommended PSU |
 |---------------|-------|-----------------|
 | Pi Zero + LCD | 1-2W | 5V 1A |
 | Pi 4 + Display | 5-10W | 5V 3A |
 | Orange Pi 5 | 8-15W | 5V 4A or 12V 2A |
 | With NVMe SSD | +2-3W | Add 1A headroom |
 ### Power Options
 - **USB-C PD** - Modern, efficient
 - **PoE HAT** - Power over Ethernet
 - **12V Barrel** - Industrial standard
 - **Battery** - UPS, solar applications
 ### UPS Solutions
 - **PiJuice** - Pi-specific UPS HAT
 - **UPS PIco** - Small form factor
 - **Powerboost** - Adafruit, lithium battery
--- a/src/20-embedding/local-llm.md
+++ b/src/20-embedding/local-llm.md
@ -1,382 +1 @@
-# Local LLM - Offline AI with llama.cpp
+# Local LLM with llama.cpp
 Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy.
 ## Overview
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                        Local LLM Architecture                                │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                              │
 │   User Input ──▶ botserver ──▶ llama.cpp ──▶ Response                       │
 │                      │              │                                        │
 │                      │         ┌────┴────┐                                   │
 │                      │         │  Model  │                                   │
 │                      │         │  GGUF   │                                   │
 │                      │         │ (Q4_K)  │                                   │
 │                      │         └─────────┘                                   │
 │                      │                                                       │
 │                 SQLite DB                                                    │
 │                (sessions)                                                    │
 │                                                                              │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ## Recommended Models
 ### By Device RAM
 | RAM | Model | Size | Speed | Quality |
 |-----|-------|------|-------|---------|
 | **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic |
 | **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good |
 | **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good |
 | **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better |
 | **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great |
 | **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent |
 ### By Use Case
 **Simple Q&A, Commands:**
 ```
 TinyLlama 1.1B - Fast, basic understanding
 ```
 **Customer Service, FAQ:**
 ```
 Phi-2 or Gemma 2B - Good comprehension, reasonable speed
 ```
 **Complex Reasoning:**
 ```
 Llama 3.2 3B or Mistral 7B - Better accuracy, slower
 ```
 ## Installation
 ### Automatic (via deploy script)
 ```bash
 ./scripts/deploy-embedded.sh pi@device --with-llama
 ```
 ### Manual Installation
 ```bash
 # SSH to device
 ssh pi@raspberrypi.local
 # Install dependencies
 sudo apt update
 sudo apt install -y build-essential cmake git wget
 # Clone llama.cpp
 cd /opt
 sudo git clone https://github.com/ggerganov/llama.cpp
 sudo chown -R $(whoami):$(whoami) llama.cpp
 cd llama.cpp
 # Build for ARM (auto-optimizes)
 mkdir build && cd build
 cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release
 make -j$(nproc)
 # Download model
 mkdir -p /opt/llama.cpp/models
 cd /opt/llama.cpp/models
 wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
 ```
 ### Start Server
 ```bash
 # Test run
 /opt/llama.cpp/build/bin/llama-server \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -c 2048 \
    --threads 4
 # Verify
 curl http://localhost:8080/v1/models
 ```
 ### Systemd Service
 Create `/etc/systemd/system/llama-server.service`:
 ```ini
 [Unit]
 Description=llama.cpp Server - Local LLM
 After=network.target
 [Service]
 Type=simple
 User=root
 WorkingDirectory=/opt/llama.cpp
 ExecStart=/opt/llama.cpp/build/bin/llama-server \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -c 2048 \
    -ngl 0 \
    --threads 4
 Restart=always
 RestartSec=5
 [Install]
 WantedBy=multi-user.target
 ```
 Enable and start:
 ```bash
 sudo systemctl daemon-reload
 sudo systemctl enable llama-server
 sudo systemctl start llama-server
 ```
 ## Configuration
 ### botserver .env
 ```env
 # Use local llama.cpp
 LLM_PROVIDER=llamacpp
 LLM_API_URL=http://127.0.0.1:8080
 LLM_MODEL=tinyllama
 # Memory limits
 MAX_CONTEXT_TOKENS=2048
 MAX_RESPONSE_TOKENS=512
 STREAMING_ENABLED=true
 ```
 ### llama.cpp Parameters
 | Parameter | Default | Description |
 |-----------|---------|-------------|
 | `-c` | 2048 | Context size (tokens) |
 | `--threads` | 4 | CPU threads |
 | `-ngl` | 0 | GPU layers (0 for CPU only) |
 | `--host` | 127.0.0.1 | Bind address |
 | `--port` | 8080 | Server port |
 | `-b` | 512 | Batch size |
 | `--mlock` | off | Lock model in RAM |
 ### Memory vs Context Size
 ```
 Context 512:  ~400MB RAM, fast, limited conversation
 Context 1024: ~600MB RAM, moderate
 Context 2048: ~900MB RAM, good for most uses
 Context 4096: ~1.5GB RAM, long conversations
 ```
 ## Performance Optimization
 ### CPU Optimization
 ```bash
 # Check CPU features
 cat /proc/cpuinfo | grep -E "(model name|Features)"
 # Build with specific optimizations
 cmake .. -DLLAMA_NATIVE=ON \
         -DCMAKE_BUILD_TYPE=Release \
         -DLLAMA_ARM_FMA=ON \
         -DLLAMA_ARM_DOTPROD=ON
 ```
 ### Memory Optimization
 ```bash
 # For 2GB RAM devices
 # Use smaller context
 -c 1024
 # Use memory mapping (slower but less RAM)
 --mmap
 # Disable mlock (don't pin to RAM)
 # (default is disabled)
 ```
 ### Swap Configuration
 For devices with limited RAM:
 ```bash
 # Create 2GB swap
 sudo fallocate -l 2G /swapfile
 sudo chmod 600 /swapfile
 sudo mkswap /swapfile
 sudo swapon /swapfile
 # Make permanent
 echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
 # Optimize swap usage
 echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
 ```
 ## NPU Acceleration (Orange Pi 5)
 Orange Pi 5 has a 6 TOPS NPU that can accelerate inference:
 ### Using rkllm (Rockchip NPU)
 ```bash
 # Install rkllm runtime
 git clone https://github.com/airockchip/rknn-llm
 cd rknn-llm
 ./install.sh
 # Convert model to RKNN format
 python3 convert_model.py \
    --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --output tinyllama.rkllm
 # Run with NPU
 rkllm-server \
    --model tinyllama.rkllm \
    --port 8080
 ```
 Expected speedup: **3-5x faster** than CPU only.
 ## Model Download URLs
 ### TinyLlama 1.1B (Recommended for 2GB)
 ```bash
 wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
 ```
 ### Phi-2 2.7B (Recommended for 4GB)
 ```bash
 wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf
 ```
 ### Gemma 2B
 ```bash
 wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf
 ```
 ### Llama 3.2 3B (Recommended for 8GB)
 ```bash
 wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
 ```
 ### Mistral 7B
 ```bash
 wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
 ```
 ## API Usage
 llama.cpp exposes an OpenAI-compatible API:
 ### Chat Completion
 ```bash
 curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "max_tokens": 100
  }'
 ```
 ### Streaming
 ```bash
 curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'
 ```
 ### Health Check
 ```bash
 curl http://localhost:8080/health
 curl http://localhost:8080/v1/models
 ```
 ## Monitoring
 ### Check Performance
 ```bash
 # Watch resource usage
 htop
 # Check inference speed in logs
 sudo journalctl -u llama-server -f | grep "tokens/s"
 # Memory usage
 free -h
 ```
 ### Benchmarking
 ```bash
 # Run llama.cpp benchmark
 /opt/llama.cpp/build/bin/llama-bench \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    -p 512 -n 128 -t 4
 ```
 ## Troubleshooting
 ### Model Loading Fails
 ```bash
 # Check available RAM
 free -h
 # Try smaller context
 -c 512
 # Use memory mapping
 --mmap
 ```
 ### Slow Inference
 ```bash
 # Increase threads (up to CPU cores)
 --threads $(nproc)
 # Use optimized build
 cmake .. -DLLAMA_NATIVE=ON
 # Consider smaller model
 ```
 ### Out of Memory Killer
 ```bash
 # Check if OOM killed the process
 dmesg | grep -i "killed process"
 # Increase swap
 # Use smaller model
 # Reduce context size
 ```
 ## Best Practices
 1. **Start small** - Begin with TinyLlama, upgrade if needed
 2. **Monitor memory** - Use `htop` during initial tests
 3. **Set appropriate context** - 1024-2048 for most embedded use
 4. **Use quantized models** - Q4_K_M is a good balance
 5. **Enable streaming** - Better UX on slow inference
 6. **Test offline** - Verify it works without internet before deployment
--- a/src/20-embedding/quick-start.md
+++ b/src/20-embedding/quick-start.md
@ -1,209 +1 @@
-# Quick Start - Deploy in 5 Minutes
+# Quick Start
 Get General Bots running on your embedded device with local AI in just a few commands.
 ## Prerequisites
 - An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian
 - SSH access to the device
 - Internet connection (for initial setup only)
 ## One-Line Deploy
 From your development machine:
 ```bash
 # Clone and run the deployment script
 git clone https://github.com/GeneralBots/botserver.git
 cd botserver
 # Deploy to Orange Pi (replace with your device IP)
 ./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama
 ```
 That's it! After ~10-15 minutes:
 - BotServer runs on port 8088
 - llama.cpp runs on port 8080 with TinyLlama
 - Embedded UI available at `http://your-device:8088/embedded/`
 ## Step-by-Step Guide
 ### Step 1: Prepare Your Device
 Flash your SBC with a compatible OS:
 **Raspberry Pi:**
 ```bash
 # Download Raspberry Pi Imager
 # Select: Raspberry Pi OS Lite (64-bit)
 # Enable SSH in settings
 ```
 **Orange Pi:**
 ```bash
 # Download Armbian from armbian.com
 # Flash with balenaEtcher
 ```
 ### Step 2: First Boot Configuration
 ```bash
 # SSH into your device
 ssh pi@raspberrypi.local  # or orangepi@orangepi.local
 # Update system
 sudo apt update && sudo apt upgrade -y
 # Set timezone
 sudo timedatectl set-timezone America/Sao_Paulo
 # Enable I2C/SPI if using GPIO displays
 sudo raspi-config  # or armbian-config
 ```
 ### Step 3: Run Deployment Script
 From your development PC:
 ```bash
 # Basic deployment (botserver only)
 ./scripts/deploy-embedded.sh pi@raspberrypi.local
 # With embedded UI
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui
 # With local LLM (requires 4GB+ RAM)
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama
 # Specify a different model
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf
 ```
 ### Step 4: Verify Installation
 ```bash
 # Check services
 ssh pi@raspberrypi.local 'sudo systemctl status botserver'
 ssh pi@raspberrypi.local 'sudo systemctl status llama-server'
 # Test botserver
 curl http://raspberrypi.local:8088/health
 # Test llama.cpp
 curl http://raspberrypi.local:8080/v1/models
 ```
 ### Step 5: Access the Interface
 Open in your browser:
 ```
 http://raspberrypi.local:8088/embedded/
 ```
 Or set up kiosk mode (auto-starts on boot):
 ```bash
 # Already configured if you used --with-ui
 # Just reboot:
 ssh pi@raspberrypi.local 'sudo reboot'
 ```
 ## Local Installation (On the Device)
 If you prefer to install directly on the device:
 ```bash
 # SSH into the device
 ssh pi@raspberrypi.local
 # Install Rust
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 source ~/.cargo/env
 # Clone and build
 git clone https://github.com/GeneralBots/botserver.git
 cd botserver
 # Run local deployment
 ./scripts/deploy-embedded.sh --local --with-ui --with-llama
 ```
 ⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster.
 ## Configuration
 After deployment, edit the config file:
 ```bash
 ssh pi@raspberrypi.local
 sudo nano /opt/botserver/.env
 ```
 Key settings:
 ```env
 # Server
 HOST=0.0.0.0
 PORT=8088
 # Local LLM
 LLM_PROVIDER=llamacpp
 LLM_API_URL=http://127.0.0.1:8080
 LLM_MODEL=tinyllama
 # Memory limits for small devices
 MAX_CONTEXT_TOKENS=2048
 MAX_RESPONSE_TOKENS=512
 ```
 Restart after changes:
 ```bash
 sudo systemctl restart botserver
 ```
 ## Troubleshooting
 ### Out of Memory
 ```bash
 # Check memory usage
 free -h
 # Reduce llama.cpp context
 sudo nano /etc/systemd/system/llama-server.service
 # Change -c 2048 to -c 1024
 # Or use a smaller model
 # TinyLlama uses ~700MB, Phi-2 uses ~1.6GB
 ```
 ### Service Won't Start
 ```bash
 # Check logs
 sudo journalctl -u botserver -f
 sudo journalctl -u llama-server -f
 # Common issues:
 # - Port already in use
 # - Missing model file
 # - Database permissions
 ```
 ### Display Not Working
 ```bash
 # Check if display is detected
 ls /dev/fb*       # HDMI/DSI
 ls /dev/i2c*      # I2C displays
 ls /dev/spidev*   # SPI displays
 # For HDMI, check config
 sudo nano /boot/config.txt  # Raspberry Pi
 sudo nano /boot/armbianEnv.txt  # Orange Pi
 ```
 ## Next Steps
 - [Embedded UI Guide](./embedded-ui.md) - Customize the interface
 - [Local LLM Configuration](./local-llm.md) - Optimize AI performance
 - [Kiosk Mode](./kiosk-mode.md) - Production deployment
 - [Offline Operation](./offline.md) - Disconnected environments
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -320,9 +320,17 @@
  - [Permissions Matrix](./12-auth/permissions-matrix.md)
  - [User Context vs System Context](./12-auth/user-system-context.md)
-# Part XII - Community
+# Part XII - Device & Offline Deployment
- [Chapter 13: Contributing](./13-community/README.md)
+- [Chapter 13: Device Deployment](./13-devices/README.md)
  - [Mobile (Android & HarmonyOS)](./13-devices/mobile.md)
  - [Supported Hardware (SBCs)](./13-devices/hardware.md)
  - [Quick Start](./13-devices/quick-start.md)
  - [Local LLM with llama.cpp](./13-devices/local-llm.md)
 # Part XIII - Community
 - [Chapter 14: Contributing](./13-community/README.md)
  - [Development Setup](./13-community/setup.md)
  - [Testing Guide](./13-community/testing.md)
  - [Documentation](./13-community/documentation.md)
@ -330,9 +338,9 @@
  - [Community Guidelines](./13-community/community.md)
  - [IDEs](./13-community/ide-extensions.md)
-# Part XIII - Migration
+# Part XIV - Migration
- [Chapter 14: Migration Guide](./14-migration/README.md)
+- [Chapter 15: Migration Guide](./14-migration/README.md)
  - [Migration Overview](./14-migration/overview.md)
  - [Platform Comparison Matrix](./14-migration/comparison-matrix.md)
  - [Migration Resources](./14-migration/resources.md)
@ -350,9 +358,9 @@
  - [Automation Migration](./14-migration/automation.md)
  - [Validation and Testing](./14-migration/validation.md)
-# Part XIV - Testing
+# Part XV - Testing
- [Chapter 17: Testing](./17-testing/README.md)
+- [Chapter 16: Testing](./17-testing/README.md)
  - [End-to-End Testing](./17-testing/e2e-testing.md)
  - [Testing Architecture](./17-testing/architecture.md)
  - [Performance Testing](./17-testing/performance.md)
@ -390,12 +398,5 @@
 - [Appendix D: Documentation Style](./16-appendix-docs-style/conversation-examples.md)
  - [SVG and Conversation Standards](./16-appendix-docs-style/svg.md)
 # Part XV - Embedded & Offline
 - [Chapter 20: Embedded Deployment](./20-embedding/README.md)
  - [Supported Hardware](./20-embedding/hardware.md)
  - [Quick Start](./20-embedding/quick-start.md)
  - [Local LLM with llama.cpp](./20-embedding/local-llm.md)
 [Glossary](./glossary.md)
 [Contact](./contact/README.md)