diff --git a/src/13-devices/README.md b/src/13-devices/README.md
new file mode 100644
index 00000000..4153b4ac
--- /dev/null
+++ b/src/13-devices/README.md
@@ -0,0 +1,54 @@
+# Chapter 13: Device & Offline Deployment
+
+Deploy General Bots to any device - from smartphones to Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities.
+
+## Overview
+
+General Bots can run on any device, from mobile phones to minimal embedded hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere:
+
+- **Kiosks** - Self-service terminals in stores, airports, hospitals
+- **Industrial IoT** - Factory floor assistants, machine interfaces
+- **Smart Home** - Wall panels, kitchen displays, door intercoms
+- **Retail** - Point-of-sale systems, product information terminals
+- **Education** - Classroom assistants, lab equipment interfaces
+- **Healthcare** - Patient check-in, medication reminders
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ Embedded GB Architecture │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ │
+│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
+│ │ Display │ │ botserver │ │ llama.cpp │ │
+│ │ LCD/OLED │────▶│ (Rust) │────▶│ (Local) │ │
+│ │ TFT/HDMI │ │ Port 8088 │ │ Port 8080 │ │
+│ └──────────────┘ └──────────────┘ └──────────────┘ │
+│ │ │ │ │
+│ │ │ │ │
+│ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ │
+│ │ Keyboard │ │ SQLite │ │ TinyLlama │ │
+│ │ Buttons │ │ (Data) │ │ GGUF │ │
+│ │ Touch │ │ │ │ (~700MB) │ │
+│ └─────────────┘ └─────────────┘ └─────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## What's in This Chapter
+
+### Mobile Deployment
+- [Mobile (Android & HarmonyOS)](./mobile.md) - BotOS for smartphones and tablets
+
+### Embedded Deployment
+- [Supported Hardware](./hardware.md) - SBCs, displays, and peripherals
+- [Quick Start](./quick-start.md) - Deploy in 5 minutes
+- [Local LLM](./local-llm.md) - Offline AI with llama.cpp
+
+### Deployment Options
+
+| Platform | Use Case | Requirements |
+|----------|----------|--------------|
+| **Android/HarmonyOS** | Smartphones, tablets, kiosks | Any Android 8+ device |
+| **Raspberry Pi** | IoT, displays, terminals | 1GB+ RAM |
+| **Orange Pi** | Full offline AI | 4GB+ RAM for LLM |
+| **Industrial** | Factory, retail, healthcare | Any ARM/x86 SBC |
diff --git a/src/13-devices/hardware.md b/src/13-devices/hardware.md
new file mode 100644
index 00000000..f8d6d416
--- /dev/null
+++ b/src/13-devices/hardware.md
@@ -0,0 +1,190 @@
+# Supported Hardware
+
+## Single Board Computers (SBCs)
+
+### Recommended Boards
+
+| Board | CPU | RAM | Best For | Price |
+|-------|-----|-----|----------|-------|
+| **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 |
+| **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 |
+| **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 |
+| **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 |
+| **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 |
+| **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 |
+| **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 |
+| **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 |
+| **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 |
+| **ODROID-N2+** | S922X | 4GB | High performance | $79 |
+
+### Minimum Requirements
+
+**For UI only (connect to remote botserver):**
+- Any ARM/x86 Linux board
+- 256MB RAM
+- Network connection
+- Display output
+
+**For local botserver:**
+- ARM64 or x86_64
+- 1GB RAM minimum
+- 4GB storage
+
+**For local LLM (llama.cpp):**
+- ARM64 or x86_64
+- 2GB+ RAM (4GB recommended)
+- 2GB+ storage for model
+
+### Orange Pi 5 (Recommended for LLM)
+
+The Orange Pi 5 with RK3588S is ideal for embedded LLM:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ Orange Pi 5 - Best for Offline AI │
+├─────────────────────────────────────────────────────────────┤
+│ CPU: Rockchip RK3588S (4x A76 + 4x A55) │
+│ NPU: 6 TOPS (Neural Processing Unit) │
+│ GPU: Mali-G610 MP4 │
+│ RAM: 4GB / 8GB / 16GB LPDDR4X │
+│ Storage: M.2 NVMe + eMMC + microSD │
+│ │
+│ LLM Performance: │
+│ ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec │
+│ ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec │
+│ └─ With NPU (rkllm): ~20-30 tokens/sec │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Displays
+
+### Character LCDs (Minimal)
+
+For text-only interfaces:
+
+| Display | Resolution | Interface | Use Case |
+|---------|------------|-----------|----------|
+| HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A |
+| HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context |
+| LCD2004 | 20 chars × 4 lines | I2C | Industrial |
+
+**Example output on 16x2:**
+```
+┌────────────────┐
+│> How can I help│
+│< Processing... │
+└────────────────┘
+```
+
+### OLED Displays
+
+For graphical monochrome interfaces:
+
+| Display | Resolution | Interface | Size |
+|---------|------------|-----------|------|
+| SSD1306 | 128×64 | I2C/SPI | 0.96" |
+| SSD1309 | 128×64 | I2C/SPI | 2.42" |
+| SH1106 | 128×64 | I2C/SPI | 1.3" |
+| SSD1322 | 256×64 | SPI | 3.12" |
+
+### TFT/IPS Color Displays
+
+For full graphical interface:
+
+| Display | Resolution | Interface | Notes |
+|---------|------------|-----------|-------|
+| ILI9341 | 320×240 | SPI | Common, cheap |
+| ST7789 | 240×320 | SPI | Fast refresh |
+| ILI9488 | 480×320 | SPI | Larger |
+| Waveshare 5" | 800×480 | HDMI | Touch optional |
+| Waveshare 7" | 1024×600 | HDMI | Touch, IPS |
+| Official Pi 7" | 800×480 | DSI | Best for Pi |
+
+### E-Ink/E-Paper
+
+For low-power, readable in sunlight:
+
+| Display | Resolution | Colors | Refresh |
+|---------|------------|--------|---------|
+| Waveshare 2.13" | 250×122 | B/W | 2s |
+| Waveshare 4.2" | 400×300 | B/W | 4s |
+| Waveshare 7.5" | 800×480 | B/W | 5s |
+| Good Display 9.7" | 1200×825 | B/W | 6s |
+
+**Best for:** Menu displays, signs, low-update applications
+
+### Industrial Displays
+
+| Display | Resolution | Features |
+|---------|------------|----------|
+| Advantech | Various | Wide temp, sunlight |
+| Winstar | Various | Industrial grade |
+| Newhaven | Various | Long availability |
+
+## Input Devices
+
+### Keyboards
+
+- **USB Keyboard** - Standard, any USB keyboard works
+- **PS/2 Keyboard** - Via adapter, lower latency
+- **Matrix Keypad** - 4x4 or 3x4, GPIO connected
+- **I2C Keypad** - Fewer GPIO pins needed
+
+### Touch Input
+
+- **Capacitive Touch** - Better response, needs driver
+- **Resistive Touch** - Works with gloves, pressure-based
+- **IR Touch Frame** - Large displays, vandal-resistant
+
+### Buttons & GPIO
+
+```
+┌─────────────────────────────────────────────┐
+│ Simple 4-Button Interface │
+├─────────────────────────────────────────────┤
+│ │
+│ [◄ PREV] [▲ UP] [▼ DOWN] [► SELECT] │
+│ │
+│ GPIO 17 GPIO 27 GPIO 22 GPIO 23 │
+│ │
+└─────────────────────────────────────────────┘
+```
+
+## Enclosures
+
+### Commercial Options
+
+- **Hammond Manufacturing** - Industrial metal enclosures
+- **Polycase** - Plastic, IP65 rated
+- **Bud Industries** - Various sizes
+- **Pi-specific cases** - Argon, Flirc, etc.
+
+### DIY Options
+
+- **3D Printed** - Custom fit, PLA/PETG
+- **Laser Cut** - Acrylic, wood
+- **Metal Fabrication** - Professional look
+
+## Power
+
+### Power Requirements
+
+| Configuration | Power | Recommended PSU |
+|---------------|-------|-----------------|
+| Pi Zero + LCD | 1-2W | 5V 1A |
+| Pi 4 + Display | 5-10W | 5V 3A |
+| Orange Pi 5 | 8-15W | 5V 4A or 12V 2A |
+| With NVMe SSD | +2-3W | Add 1A headroom |
+
+### Power Options
+
+- **USB-C PD** - Modern, efficient
+- **PoE HAT** - Power over Ethernet
+- **12V Barrel** - Industrial standard
+- **Battery** - UPS, solar applications
+
+### UPS Solutions
+
+- **PiJuice** - Pi-specific UPS HAT
+- **UPS PIco** - Small form factor
+- **Powerboost** - Adafruit, lithium battery
diff --git a/src/13-devices/local-llm.md b/src/13-devices/local-llm.md
new file mode 100644
index 00000000..3feb4199
--- /dev/null
+++ b/src/13-devices/local-llm.md
@@ -0,0 +1,382 @@
+# Local LLM - Offline AI with llama.cpp
+
+Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy.
+
+## Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ Local LLM Architecture │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ │
+│ User Input ──▶ botserver ──▶ llama.cpp ──▶ Response │
+│ │ │ │
+│ │ ┌────┴────┐ │
+│ │ │ Model │ │
+│ │ │ GGUF │ │
+│ │ │ (Q4_K) │ │
+│ │ └─────────┘ │
+│ │ │
+│ SQLite DB │
+│ (sessions) │
+│ │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Recommended Models
+
+### By Device RAM
+
+| RAM | Model | Size | Speed | Quality |
+|-----|-------|------|-------|---------|
+| **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic |
+| **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good |
+| **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good |
+| **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better |
+| **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great |
+| **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent |
+
+### By Use Case
+
+**Simple Q&A, Commands:**
+```
+TinyLlama 1.1B - Fast, basic understanding
+```
+
+**Customer Service, FAQ:**
+```
+Phi-2 or Gemma 2B - Good comprehension, reasonable speed
+```
+
+**Complex Reasoning:**
+```
+Llama 3.2 3B or Mistral 7B - Better accuracy, slower
+```
+
+## Installation
+
+### Automatic (via deploy script)
+
+```bash
+./scripts/deploy-embedded.sh pi@device --with-llama
+```
+
+### Manual Installation
+
+```bash
+# SSH to device
+ssh pi@raspberrypi.local
+
+# Install dependencies
+sudo apt update
+sudo apt install -y build-essential cmake git wget
+
+# Clone llama.cpp
+cd /opt
+sudo git clone https://github.com/ggerganov/llama.cpp
+sudo chown -R $(whoami):$(whoami) llama.cpp
+cd llama.cpp
+
+# Build for ARM (auto-optimizes)
+mkdir build && cd build
+cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release
+make -j$(nproc)
+
+# Download model
+mkdir -p /opt/llama.cpp/models
+cd /opt/llama.cpp/models
+wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
+```
+
+### Start Server
+
+```bash
+# Test run
+/opt/llama.cpp/build/bin/llama-server \
+ -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+ --host 0.0.0.0 \
+ --port 8080 \
+ -c 2048 \
+ --threads 4
+
+# Verify
+curl http://localhost:8080/v1/models
+```
+
+### Systemd Service
+
+Create `/etc/systemd/system/llama-server.service`:
+
+```ini
+[Unit]
+Description=llama.cpp Server - Local LLM
+After=network.target
+
+[Service]
+Type=simple
+User=root
+WorkingDirectory=/opt/llama.cpp
+ExecStart=/opt/llama.cpp/build/bin/llama-server \
+ -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+ --host 0.0.0.0 \
+ --port 8080 \
+ -c 2048 \
+ -ngl 0 \
+ --threads 4
+Restart=always
+RestartSec=5
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start:
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable llama-server
+sudo systemctl start llama-server
+```
+
+## Configuration
+
+### botserver .env
+
+```env
+# Use local llama.cpp
+LLM_PROVIDER=llamacpp
+LLM_API_URL=http://127.0.0.1:8080
+LLM_MODEL=tinyllama
+
+# Memory limits
+MAX_CONTEXT_TOKENS=2048
+MAX_RESPONSE_TOKENS=512
+STREAMING_ENABLED=true
+```
+
+### llama.cpp Parameters
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `-c` | 2048 | Context size (tokens) |
+| `--threads` | 4 | CPU threads |
+| `-ngl` | 0 | GPU layers (0 for CPU only) |
+| `--host` | 127.0.0.1 | Bind address |
+| `--port` | 8080 | Server port |
+| `-b` | 512 | Batch size |
+| `--mlock` | off | Lock model in RAM |
+
+### Memory vs Context Size
+
+```
+Context 512: ~400MB RAM, fast, limited conversation
+Context 1024: ~600MB RAM, moderate
+Context 2048: ~900MB RAM, good for most uses
+Context 4096: ~1.5GB RAM, long conversations
+```
+
+## Performance Optimization
+
+### CPU Optimization
+
+```bash
+# Check CPU features
+cat /proc/cpuinfo | grep -E "(model name|Features)"
+
+# Build with specific optimizations
+cmake .. -DLLAMA_NATIVE=ON \
+ -DCMAKE_BUILD_TYPE=Release \
+ -DLLAMA_ARM_FMA=ON \
+ -DLLAMA_ARM_DOTPROD=ON
+```
+
+### Memory Optimization
+
+```bash
+# For 2GB RAM devices
+# Use smaller context
+-c 1024
+
+# Use memory mapping (slower but less RAM)
+--mmap
+
+# Disable mlock (don't pin to RAM)
+# (default is disabled)
+```
+
+### Swap Configuration
+
+For devices with limited RAM:
+
+```bash
+# Create 2GB swap
+sudo fallocate -l 2G /swapfile
+sudo chmod 600 /swapfile
+sudo mkswap /swapfile
+sudo swapon /swapfile
+
+# Make permanent
+echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
+
+# Optimize swap usage
+echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
+```
+
+## NPU Acceleration (Orange Pi 5)
+
+Orange Pi 5 has a 6 TOPS NPU that can accelerate inference:
+
+### Using rkllm (Rockchip NPU)
+
+```bash
+# Install rkllm runtime
+git clone https://github.com/airockchip/rknn-llm
+cd rknn-llm
+./install.sh
+
+# Convert model to RKNN format
+python3 convert_model.py \
+ --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+ --output tinyllama.rkllm
+
+# Run with NPU
+rkllm-server \
+ --model tinyllama.rkllm \
+ --port 8080
+```
+
+Expected speedup: **3-5x faster** than CPU only.
+
+## Model Download URLs
+
+### TinyLlama 1.1B (Recommended for 2GB)
+```bash
+wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
+```
+
+### Phi-2 2.7B (Recommended for 4GB)
+```bash
+wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf
+```
+
+### Gemma 2B
+```bash
+wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf
+```
+
+### Llama 3.2 3B (Recommended for 8GB)
+```bash
+wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
+```
+
+### Mistral 7B
+```bash
+wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
+```
+
+## API Usage
+
+llama.cpp exposes an OpenAI-compatible API:
+
+### Chat Completion
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "tinyllama",
+ "messages": [
+ {"role": "user", "content": "What is 2+2?"}
+ ],
+ "max_tokens": 100
+ }'
+```
+
+### Streaming
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "tinyllama",
+ "messages": [{"role": "user", "content": "Tell me a story"}],
+ "stream": true
+ }'
+```
+
+### Health Check
+
+```bash
+curl http://localhost:8080/health
+curl http://localhost:8080/v1/models
+```
+
+## Monitoring
+
+### Check Performance
+
+```bash
+# Watch resource usage
+htop
+
+# Check inference speed in logs
+sudo journalctl -u llama-server -f | grep "tokens/s"
+
+# Memory usage
+free -h
+```
+
+### Benchmarking
+
+```bash
+# Run llama.cpp benchmark
+/opt/llama.cpp/build/bin/llama-bench \
+ -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+ -p 512 -n 128 -t 4
+```
+
+## Troubleshooting
+
+### Model Loading Fails
+
+```bash
+# Check available RAM
+free -h
+
+# Try smaller context
+-c 512
+
+# Use memory mapping
+--mmap
+```
+
+### Slow Inference
+
+```bash
+# Increase threads (up to CPU cores)
+--threads $(nproc)
+
+# Use optimized build
+cmake .. -DLLAMA_NATIVE=ON
+
+# Consider smaller model
+```
+
+### Out of Memory Killer
+
+```bash
+# Check if OOM killed the process
+dmesg | grep -i "killed process"
+
+# Increase swap
+# Use smaller model
+# Reduce context size
+```
+
+## Best Practices
+
+1. **Start small** - Begin with TinyLlama, upgrade if needed
+2. **Monitor memory** - Use `htop` during initial tests
+3. **Set appropriate context** - 1024-2048 for most embedded use
+4. **Use quantized models** - Q4_K_M is a good balance
+5. **Enable streaming** - Better UX on slow inference
+6. **Test offline** - Verify it works without internet before deployment
diff --git a/src/13-devices/mobile.md b/src/13-devices/mobile.md
new file mode 100644
index 00000000..6cc3c1ca
--- /dev/null
+++ b/src/13-devices/mobile.md
@@ -0,0 +1,323 @@
+# Mobile Deployment - Android & HarmonyOS
+
+Deploy General Bots as the primary interface on Android and HarmonyOS devices, transforming them into dedicated AI assistants.
+
+## Overview
+
+BotOS transforms any Android or HarmonyOS device into a dedicated General Bots system, removing manufacturer bloatware and installing GB as the default launcher.
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ BotOS Architecture │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ │
+│ ┌──────────────────────────────────────────────────────────────────┐ │
+│ │ BotOS App (Tauri) │ │
+│ ├──────────────────────────────────────────────────────────────────┤ │
+│ │ botui/ui/suite │ Tauri Android │ src/lib.rs (Rust) │ │
+│ │ (Web Interface) │ (WebView + NDK) │ (Backend + Hardware) │ │
+│ └──────────────────────────────────────────────────────────────────┘ │
+│ │ │
+│ ┌─────────────────────────┴────────────────────────────┐ │
+│ │ Android/HarmonyOS System │ │
+│ │ ┌─────────┐ ┌──────────┐ ┌────────┐ ┌─────────┐ │ │
+│ │ │ Camera │ │ GPS │ │ WiFi │ │ Storage │ │ │
+│ │ └─────────┘ └──────────┘ └────────┘ └─────────┘ │ │
+│ └───────────────────────────────────────────────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Supported Platforms
+
+### Android
+- **AOSP** - Pure Android
+- **Samsung One UI** - Galaxy devices
+- **Xiaomi MIUI** - Mi, Redmi, Poco
+- **OPPO ColorOS** - OPPO, OnePlus, Realme
+- **Vivo Funtouch/OriginOS**
+- **Google Pixel**
+
+### HarmonyOS
+- **Huawei** - P series, Mate series, Nova
+- **Honor** - Magic series, X series
+
+## Installation Levels
+
+| Level | Requirements | What It Does |
+|-------|-------------|--------------|
+| **Level 1** | ADB only | Removes bloatware, installs BotOS as app |
+| **Level 2** | Root + Magisk | GB boot animation, BotOS as system app |
+| **Level 3** | Unlocked bootloader | Full Android replacement with BotOS |
+
+## Quick Installation
+
+### Level 1: Debloat + App (No Root)
+
+```bash
+# Clone botos repository
+git clone https://github.com/GeneralBots/botos.git
+cd botos/rom
+
+# Connect device via USB (enable USB debugging first)
+./install.sh
+```
+
+The interactive installer will:
+1. Detect your device and manufacturer
+2. Remove bloatware automatically
+3. Install BotOS APK
+4. Optionally set as default launcher
+
+### Level 2: Magisk Module (Root Required)
+
+```bash
+# Generate Magisk module
+cd botos/rom/scripts
+./build-magisk-module.sh
+
+# Copy to device
+adb push botos-magisk-v1.0.zip /sdcard/
+
+# Install via Magisk app
+# Magisk → Modules → + → Select ZIP → Reboot
+```
+
+This adds:
+- Custom boot animation
+- BotOS as system app (privileged permissions)
+- Debloat via overlay
+
+### Level 3: GSI (Full Replacement)
+
+For advanced users with unlocked bootloader. See `botos/rom/gsi/README.md`.
+
+## Bloatware Removed
+
+### Samsung One UI
+- Bixby, Samsung Pay, Samsung Pass
+- Duplicate apps (Email, Calendar, Browser)
+- AR Zone, Game Launcher
+- Samsung Free, Samsung Global Goals
+
+### Huawei EMUI/HarmonyOS
+- AppGallery, HiCloud, HiCar
+- Huawei Browser, Music, Video
+- Petal Maps, Petal Search
+- AI Life, HiSuite
+
+### Honor MagicOS
+- Honor Store, MagicRing
+- Honor Browser, Music
+
+### Xiaomi MIUI
+- MSA (analytics), Mi Apps
+- GetApps, Mi Cloud
+- Mi Browser, Mi Music
+
+### Universal (All Devices)
+- Pre-installed Facebook, Instagram
+- Pre-installed Netflix, Spotify
+- Games like Candy Crush
+- Carrier bloatware
+
+## Building from Source
+
+### Prerequisites
+
+```bash
+# Install Rust and Android targets
+rustup target add aarch64-linux-android armv7-linux-androideabi
+
+# Set up Android SDK/NDK
+export ANDROID_HOME=$HOME/Android/Sdk
+export NDK_HOME=$ANDROID_HOME/ndk/25.2.9519653
+
+# Install Tauri CLI
+cargo install tauri-cli
+
+# For icons/boot animation
+sudo apt install librsvg2-bin imagemagick
+```
+
+### Build APK
+
+```bash
+cd botos
+
+# Generate icons from SVG
+./scripts/generate-icons.sh
+
+# Initialize Android project
+cargo tauri android init
+
+# Build release APK
+cargo tauri android build --release
+```
+
+Output: `gen/android/app/build/outputs/apk/release/app-release.apk`
+
+### Development Mode
+
+```bash
+# Connect device and run
+cargo tauri android dev
+
+# Watch logs
+adb logcat -s BotOS:*
+```
+
+## Configuration
+
+### AndroidManifest.xml
+
+BotOS is configured as a launcher:
+
+```xml
+
+
+
+
+
+
+```
+
+### Permissions
+
+Default capabilities in `capabilities/default.json`:
+- Internet access
+- Camera (for QR codes, photos)
+- Location (GPS)
+- Storage (files)
+- Notifications
+
+### Connecting to Server
+
+Edit the embedded URL in `tauri.conf.json`:
+
+```json
+{
+ "build": {
+ "frontendDist": "../botui/ui/suite"
+ }
+}
+```
+
+Or configure botserver URL at runtime:
+```javascript
+window.BOTSERVER_URL = "https://your-server.com";
+```
+
+## Boot Animation
+
+Create custom boot animation with GB branding:
+
+```bash
+# Generate animation
+cd botos/scripts
+./create-bootanimation.sh
+
+# Install (requires root)
+adb root
+adb remount
+adb push bootanimation.zip /system/media/
+adb reboot
+```
+
+## Project Structure
+
+```
+botos/
+├── Cargo.toml # Rust/Tauri dependencies
+├── tauri.conf.json # Tauri config → botui/ui/suite
+├── build.rs # Build script
+├── src/lib.rs # Android entry point
+│
+├── icons/
+│ ├── gb-bot.svg # Source icon
+│ ├── icon.png # Main icon (512x512)
+│ └── */ic_launcher.png # Icons by density
+│
+├── scripts/
+│ ├── generate-icons.sh # Generate PNGs from SVG
+│ └── create-bootanimation.sh
+│
+├── capabilities/
+│ └── default.json # Tauri permissions
+│
+├── gen/android/ # Generated Android project
+│ └── app/src/main/
+│ ├── AndroidManifest.xml
+│ └── res/values/themes.xml
+│
+└── rom/ # Installation tools
+ ├── install.sh # Interactive installer
+ ├── scripts/
+ │ ├── debloat.sh # Remove bloatware
+ │ └── build-magisk-module.sh
+ └── gsi/
+ └── README.md # GSI instructions
+```
+
+## Offline Mode
+
+BotOS can work offline with local LLM:
+
+1. Install botserver on the device (see [Local LLM](./local-llm.md))
+2. Configure to use localhost:
+ ```javascript
+ window.BOTSERVER_URL = "http://127.0.0.1:8088";
+ ```
+3. Run llama.cpp with small model (TinyLlama on 4GB+ devices)
+
+## Use Cases
+
+### Dedicated Kiosk
+- Retail product information
+- Hotel check-in
+- Restaurant ordering
+- Museum guides
+
+### Enterprise Device
+- Field service assistant
+- Warehouse scanner with AI
+- Delivery driver companion
+- Healthcare bedside terminal
+
+### Consumer Device
+- Elder-friendly phone
+- Child-safe device
+- Single-purpose assistant
+- Smart home controller
+
+## Troubleshooting
+
+### App Won't Install
+```bash
+# Enable installation from unknown sources
+# Settings → Security → Unknown Sources
+
+# Or use ADB
+adb install -r botos.apk
+```
+
+### Debloat Not Working
+```bash
+# Some packages require root
+# Use Level 2 (Magisk) for complete removal
+
+# Check which packages failed
+adb shell pm list packages | grep
+```
+
+### Boot Loop After GSI
+```bash
+# Boot into recovery
+# Wipe data/factory reset
+# Reflash stock ROM
+```
+
+### WebView Crashes
+```bash
+# Update Android System WebView
+adb shell pm enable com.google.android.webview
diff --git a/src/13-devices/quick-start.md b/src/13-devices/quick-start.md
new file mode 100644
index 00000000..b7a15fb2
--- /dev/null
+++ b/src/13-devices/quick-start.md
@@ -0,0 +1,209 @@
+# Quick Start - Deploy in 5 Minutes
+
+Get General Bots running on your embedded device with local AI in just a few commands.
+
+## Prerequisites
+
+- An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian
+- SSH access to the device
+- Internet connection (for initial setup only)
+
+## One-Line Deploy
+
+From your development machine:
+
+```bash
+# Clone and run the deployment script
+git clone https://github.com/GeneralBots/botserver.git
+cd botserver
+
+# Deploy to Orange Pi (replace with your device IP)
+./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama
+```
+
+That's it! After ~10-15 minutes:
+- BotServer runs on port 8088
+- llama.cpp runs on port 8080 with TinyLlama
+- Embedded UI available at `http://your-device:8088/embedded/`
+
+## Step-by-Step Guide
+
+### Step 1: Prepare Your Device
+
+Flash your SBC with a compatible OS:
+
+**Raspberry Pi:**
+```bash
+# Download Raspberry Pi Imager
+# Select: Raspberry Pi OS Lite (64-bit)
+# Enable SSH in settings
+```
+
+**Orange Pi:**
+```bash
+# Download Armbian from armbian.com
+# Flash with balenaEtcher
+```
+
+### Step 2: First Boot Configuration
+
+```bash
+# SSH into your device
+ssh pi@raspberrypi.local # or orangepi@orangepi.local
+
+# Update system
+sudo apt update && sudo apt upgrade -y
+
+# Set timezone
+sudo timedatectl set-timezone America/Sao_Paulo
+
+# Enable I2C/SPI if using GPIO displays
+sudo raspi-config # or armbian-config
+```
+
+### Step 3: Run Deployment Script
+
+From your development PC:
+
+```bash
+# Basic deployment (botserver only)
+./scripts/deploy-embedded.sh pi@raspberrypi.local
+
+# With embedded UI
+./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui
+
+# With local LLM (requires 4GB+ RAM)
+./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama
+
+# Specify a different model
+./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf
+```
+
+### Step 4: Verify Installation
+
+```bash
+# Check services
+ssh pi@raspberrypi.local 'sudo systemctl status botserver'
+ssh pi@raspberrypi.local 'sudo systemctl status llama-server'
+
+# Test botserver
+curl http://raspberrypi.local:8088/health
+
+# Test llama.cpp
+curl http://raspberrypi.local:8080/v1/models
+```
+
+### Step 5: Access the Interface
+
+Open in your browser:
+```
+http://raspberrypi.local:8088/embedded/
+```
+
+Or set up kiosk mode (auto-starts on boot):
+```bash
+# Already configured if you used --with-ui
+# Just reboot:
+ssh pi@raspberrypi.local 'sudo reboot'
+```
+
+## Local Installation (On the Device)
+
+If you prefer to install directly on the device:
+
+```bash
+# SSH into the device
+ssh pi@raspberrypi.local
+
+# Install Rust
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+source ~/.cargo/env
+
+# Clone and build
+git clone https://github.com/GeneralBots/botserver.git
+cd botserver
+
+# Run local deployment
+./scripts/deploy-embedded.sh --local --with-ui --with-llama
+```
+
+⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster.
+
+## Configuration
+
+After deployment, edit the config file:
+
+```bash
+ssh pi@raspberrypi.local
+sudo nano /opt/botserver/.env
+```
+
+Key settings:
+```env
+# Server
+HOST=0.0.0.0
+PORT=8088
+
+# Local LLM
+LLM_PROVIDER=llamacpp
+LLM_API_URL=http://127.0.0.1:8080
+LLM_MODEL=tinyllama
+
+# Memory limits for small devices
+MAX_CONTEXT_TOKENS=2048
+MAX_RESPONSE_TOKENS=512
+```
+
+Restart after changes:
+```bash
+sudo systemctl restart botserver
+```
+
+## Troubleshooting
+
+### Out of Memory
+
+```bash
+# Check memory usage
+free -h
+
+# Reduce llama.cpp context
+sudo nano /etc/systemd/system/llama-server.service
+# Change -c 2048 to -c 1024
+
+# Or use a smaller model
+# TinyLlama uses ~700MB, Phi-2 uses ~1.6GB
+```
+
+### Service Won't Start
+
+```bash
+# Check logs
+sudo journalctl -u botserver -f
+sudo journalctl -u llama-server -f
+
+# Common issues:
+# - Port already in use
+# - Missing model file
+# - Database permissions
+```
+
+### Display Not Working
+
+```bash
+# Check if display is detected
+ls /dev/fb* # HDMI/DSI
+ls /dev/i2c* # I2C displays
+ls /dev/spidev* # SPI displays
+
+# For HDMI, check config
+sudo nano /boot/config.txt # Raspberry Pi
+sudo nano /boot/armbianEnv.txt # Orange Pi
+```
+
+## Next Steps
+
+- [Embedded UI Guide](./embedded-ui.md) - Customize the interface
+- [Local LLM Configuration](./local-llm.md) - Optimize AI performance
+- [Kiosk Mode](./kiosk-mode.md) - Production deployment
+- [Offline Operation](./offline.md) - Disconnected environments
diff --git a/src/20-embedding/README.md b/src/20-embedding/README.md
index 69fce7df..21ffe7e0 100644
--- a/src/20-embedding/README.md
+++ b/src/20-embedding/README.md
@@ -1,47 +1 @@
-# Chapter 20: Embedded & Offline Deployment
-
-Deploy General Bots to any device - from Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities.
-
-## Overview
-
-General Bots can run on minimal hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere:
-
-- **Kiosks** - Self-service terminals in stores, airports, hospitals
-- **Industrial IoT** - Factory floor assistants, machine interfaces
-- **Smart Home** - Wall panels, kitchen displays, door intercoms
-- **Retail** - Point-of-sale systems, product information terminals
-- **Education** - Classroom assistants, lab equipment interfaces
-- **Healthcare** - Patient check-in, medication reminders
-
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│ Embedded GB Architecture │
-├─────────────────────────────────────────────────────────────────────────────┤
-│ │
-│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
-│ │ Display │ │ botserver │ │ llama.cpp │ │
-│ │ LCD/OLED │────▶│ (Rust) │────▶│ (Local) │ │
-│ │ TFT/HDMI │ │ Port 8088 │ │ Port 8080 │ │
-│ └──────────────┘ └──────────────┘ └──────────────┘ │
-│ │ │ │ │
-│ │ │ │ │
-│ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ │
-│ │ Keyboard │ │ SQLite │ │ TinyLlama │ │
-│ │ Buttons │ │ (Data) │ │ GGUF │ │
-│ │ Touch │ │ │ │ (~700MB) │ │
-│ └─────────────┘ └─────────────┘ └─────────────┘ │
-│ │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-
-## What's in This Chapter
-
-- [Supported Hardware](./hardware.md) - Boards, displays, and peripherals
-- [Quick Start](./quick-start.md) - Deploy in 5 minutes
-- [Embedded UI](./embedded-ui.md) - Interface for small displays
-- [Local LLM](./local-llm.md) - Offline AI with llama.cpp
-- [Display Modes](./display-modes.md) - LCD, OLED, TFT, E-ink configurations
-- [Kiosk Mode](./kiosk-mode.md) - Locked-down production deployments
-- [Performance Tuning](./performance.md) - Optimize for limited resources
-- [Offline Operation](./offline.md) - No internet required
-- [Use Cases](./use-cases.md) - Real-world deployment examples
+# Chapter 20: Embedded Deployment
diff --git a/src/20-embedding/hardware.md b/src/20-embedding/hardware.md
index f8d6d416..1bbe111c 100644
--- a/src/20-embedding/hardware.md
+++ b/src/20-embedding/hardware.md
@@ -1,190 +1 @@
# Supported Hardware
-
-## Single Board Computers (SBCs)
-
-### Recommended Boards
-
-| Board | CPU | RAM | Best For | Price |
-|-------|-----|-----|----------|-------|
-| **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 |
-| **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 |
-| **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 |
-| **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 |
-| **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 |
-| **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 |
-| **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 |
-| **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 |
-| **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 |
-| **ODROID-N2+** | S922X | 4GB | High performance | $79 |
-
-### Minimum Requirements
-
-**For UI only (connect to remote botserver):**
-- Any ARM/x86 Linux board
-- 256MB RAM
-- Network connection
-- Display output
-
-**For local botserver:**
-- ARM64 or x86_64
-- 1GB RAM minimum
-- 4GB storage
-
-**For local LLM (llama.cpp):**
-- ARM64 or x86_64
-- 2GB+ RAM (4GB recommended)
-- 2GB+ storage for model
-
-### Orange Pi 5 (Recommended for LLM)
-
-The Orange Pi 5 with RK3588S is ideal for embedded LLM:
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ Orange Pi 5 - Best for Offline AI │
-├─────────────────────────────────────────────────────────────┤
-│ CPU: Rockchip RK3588S (4x A76 + 4x A55) │
-│ NPU: 6 TOPS (Neural Processing Unit) │
-│ GPU: Mali-G610 MP4 │
-│ RAM: 4GB / 8GB / 16GB LPDDR4X │
-│ Storage: M.2 NVMe + eMMC + microSD │
-│ │
-│ LLM Performance: │
-│ ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec │
-│ ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec │
-│ └─ With NPU (rkllm): ~20-30 tokens/sec │
-└─────────────────────────────────────────────────────────────┘
-```
-
-## Displays
-
-### Character LCDs (Minimal)
-
-For text-only interfaces:
-
-| Display | Resolution | Interface | Use Case |
-|---------|------------|-----------|----------|
-| HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A |
-| HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context |
-| LCD2004 | 20 chars × 4 lines | I2C | Industrial |
-
-**Example output on 16x2:**
-```
-┌────────────────┐
-│> How can I help│
-│< Processing... │
-└────────────────┘
-```
-
-### OLED Displays
-
-For graphical monochrome interfaces:
-
-| Display | Resolution | Interface | Size |
-|---------|------------|-----------|------|
-| SSD1306 | 128×64 | I2C/SPI | 0.96" |
-| SSD1309 | 128×64 | I2C/SPI | 2.42" |
-| SH1106 | 128×64 | I2C/SPI | 1.3" |
-| SSD1322 | 256×64 | SPI | 3.12" |
-
-### TFT/IPS Color Displays
-
-For full graphical interface:
-
-| Display | Resolution | Interface | Notes |
-|---------|------------|-----------|-------|
-| ILI9341 | 320×240 | SPI | Common, cheap |
-| ST7789 | 240×320 | SPI | Fast refresh |
-| ILI9488 | 480×320 | SPI | Larger |
-| Waveshare 5" | 800×480 | HDMI | Touch optional |
-| Waveshare 7" | 1024×600 | HDMI | Touch, IPS |
-| Official Pi 7" | 800×480 | DSI | Best for Pi |
-
-### E-Ink/E-Paper
-
-For low-power, readable in sunlight:
-
-| Display | Resolution | Colors | Refresh |
-|---------|------------|--------|---------|
-| Waveshare 2.13" | 250×122 | B/W | 2s |
-| Waveshare 4.2" | 400×300 | B/W | 4s |
-| Waveshare 7.5" | 800×480 | B/W | 5s |
-| Good Display 9.7" | 1200×825 | B/W | 6s |
-
-**Best for:** Menu displays, signs, low-update applications
-
-### Industrial Displays
-
-| Display | Resolution | Features |
-|---------|------------|----------|
-| Advantech | Various | Wide temp, sunlight |
-| Winstar | Various | Industrial grade |
-| Newhaven | Various | Long availability |
-
-## Input Devices
-
-### Keyboards
-
-- **USB Keyboard** - Standard, any USB keyboard works
-- **PS/2 Keyboard** - Via adapter, lower latency
-- **Matrix Keypad** - 4x4 or 3x4, GPIO connected
-- **I2C Keypad** - Fewer GPIO pins needed
-
-### Touch Input
-
-- **Capacitive Touch** - Better response, needs driver
-- **Resistive Touch** - Works with gloves, pressure-based
-- **IR Touch Frame** - Large displays, vandal-resistant
-
-### Buttons & GPIO
-
-```
-┌─────────────────────────────────────────────┐
-│ Simple 4-Button Interface │
-├─────────────────────────────────────────────┤
-│ │
-│ [◄ PREV] [▲ UP] [▼ DOWN] [► SELECT] │
-│ │
-│ GPIO 17 GPIO 27 GPIO 22 GPIO 23 │
-│ │
-└─────────────────────────────────────────────┘
-```
-
-## Enclosures
-
-### Commercial Options
-
-- **Hammond Manufacturing** - Industrial metal enclosures
-- **Polycase** - Plastic, IP65 rated
-- **Bud Industries** - Various sizes
-- **Pi-specific cases** - Argon, Flirc, etc.
-
-### DIY Options
-
-- **3D Printed** - Custom fit, PLA/PETG
-- **Laser Cut** - Acrylic, wood
-- **Metal Fabrication** - Professional look
-
-## Power
-
-### Power Requirements
-
-| Configuration | Power | Recommended PSU |
-|---------------|-------|-----------------|
-| Pi Zero + LCD | 1-2W | 5V 1A |
-| Pi 4 + Display | 5-10W | 5V 3A |
-| Orange Pi 5 | 8-15W | 5V 4A or 12V 2A |
-| With NVMe SSD | +2-3W | Add 1A headroom |
-
-### Power Options
-
-- **USB-C PD** - Modern, efficient
-- **PoE HAT** - Power over Ethernet
-- **12V Barrel** - Industrial standard
-- **Battery** - UPS, solar applications
-
-### UPS Solutions
-
-- **PiJuice** - Pi-specific UPS HAT
-- **UPS PIco** - Small form factor
-- **Powerboost** - Adafruit, lithium battery
diff --git a/src/20-embedding/local-llm.md b/src/20-embedding/local-llm.md
index 3feb4199..86b1bc90 100644
--- a/src/20-embedding/local-llm.md
+++ b/src/20-embedding/local-llm.md
@@ -1,382 +1 @@
-# Local LLM - Offline AI with llama.cpp
-
-Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy.
-
-## Overview
-
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│ Local LLM Architecture │
-├─────────────────────────────────────────────────────────────────────────────┤
-│ │
-│ User Input ──▶ botserver ──▶ llama.cpp ──▶ Response │
-│ │ │ │
-│ │ ┌────┴────┐ │
-│ │ │ Model │ │
-│ │ │ GGUF │ │
-│ │ │ (Q4_K) │ │
-│ │ └─────────┘ │
-│ │ │
-│ SQLite DB │
-│ (sessions) │
-│ │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-
-## Recommended Models
-
-### By Device RAM
-
-| RAM | Model | Size | Speed | Quality |
-|-----|-------|------|-------|---------|
-| **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic |
-| **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good |
-| **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good |
-| **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better |
-| **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great |
-| **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent |
-
-### By Use Case
-
-**Simple Q&A, Commands:**
-```
-TinyLlama 1.1B - Fast, basic understanding
-```
-
-**Customer Service, FAQ:**
-```
-Phi-2 or Gemma 2B - Good comprehension, reasonable speed
-```
-
-**Complex Reasoning:**
-```
-Llama 3.2 3B or Mistral 7B - Better accuracy, slower
-```
-
-## Installation
-
-### Automatic (via deploy script)
-
-```bash
-./scripts/deploy-embedded.sh pi@device --with-llama
-```
-
-### Manual Installation
-
-```bash
-# SSH to device
-ssh pi@raspberrypi.local
-
-# Install dependencies
-sudo apt update
-sudo apt install -y build-essential cmake git wget
-
-# Clone llama.cpp
-cd /opt
-sudo git clone https://github.com/ggerganov/llama.cpp
-sudo chown -R $(whoami):$(whoami) llama.cpp
-cd llama.cpp
-
-# Build for ARM (auto-optimizes)
-mkdir build && cd build
-cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release
-make -j$(nproc)
-
-# Download model
-mkdir -p /opt/llama.cpp/models
-cd /opt/llama.cpp/models
-wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
-```
-
-### Start Server
-
-```bash
-# Test run
-/opt/llama.cpp/build/bin/llama-server \
- -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
- --host 0.0.0.0 \
- --port 8080 \
- -c 2048 \
- --threads 4
-
-# Verify
-curl http://localhost:8080/v1/models
-```
-
-### Systemd Service
-
-Create `/etc/systemd/system/llama-server.service`:
-
-```ini
-[Unit]
-Description=llama.cpp Server - Local LLM
-After=network.target
-
-[Service]
-Type=simple
-User=root
-WorkingDirectory=/opt/llama.cpp
-ExecStart=/opt/llama.cpp/build/bin/llama-server \
- -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
- --host 0.0.0.0 \
- --port 8080 \
- -c 2048 \
- -ngl 0 \
- --threads 4
-Restart=always
-RestartSec=5
-
-[Install]
-WantedBy=multi-user.target
-```
-
-Enable and start:
-```bash
-sudo systemctl daemon-reload
-sudo systemctl enable llama-server
-sudo systemctl start llama-server
-```
-
-## Configuration
-
-### botserver .env
-
-```env
-# Use local llama.cpp
-LLM_PROVIDER=llamacpp
-LLM_API_URL=http://127.0.0.1:8080
-LLM_MODEL=tinyllama
-
-# Memory limits
-MAX_CONTEXT_TOKENS=2048
-MAX_RESPONSE_TOKENS=512
-STREAMING_ENABLED=true
-```
-
-### llama.cpp Parameters
-
-| Parameter | Default | Description |
-|-----------|---------|-------------|
-| `-c` | 2048 | Context size (tokens) |
-| `--threads` | 4 | CPU threads |
-| `-ngl` | 0 | GPU layers (0 for CPU only) |
-| `--host` | 127.0.0.1 | Bind address |
-| `--port` | 8080 | Server port |
-| `-b` | 512 | Batch size |
-| `--mlock` | off | Lock model in RAM |
-
-### Memory vs Context Size
-
-```
-Context 512: ~400MB RAM, fast, limited conversation
-Context 1024: ~600MB RAM, moderate
-Context 2048: ~900MB RAM, good for most uses
-Context 4096: ~1.5GB RAM, long conversations
-```
-
-## Performance Optimization
-
-### CPU Optimization
-
-```bash
-# Check CPU features
-cat /proc/cpuinfo | grep -E "(model name|Features)"
-
-# Build with specific optimizations
-cmake .. -DLLAMA_NATIVE=ON \
- -DCMAKE_BUILD_TYPE=Release \
- -DLLAMA_ARM_FMA=ON \
- -DLLAMA_ARM_DOTPROD=ON
-```
-
-### Memory Optimization
-
-```bash
-# For 2GB RAM devices
-# Use smaller context
--c 1024
-
-# Use memory mapping (slower but less RAM)
---mmap
-
-# Disable mlock (don't pin to RAM)
-# (default is disabled)
-```
-
-### Swap Configuration
-
-For devices with limited RAM:
-
-```bash
-# Create 2GB swap
-sudo fallocate -l 2G /swapfile
-sudo chmod 600 /swapfile
-sudo mkswap /swapfile
-sudo swapon /swapfile
-
-# Make permanent
-echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
-
-# Optimize swap usage
-echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
-```
-
-## NPU Acceleration (Orange Pi 5)
-
-Orange Pi 5 has a 6 TOPS NPU that can accelerate inference:
-
-### Using rkllm (Rockchip NPU)
-
-```bash
-# Install rkllm runtime
-git clone https://github.com/airockchip/rknn-llm
-cd rknn-llm
-./install.sh
-
-# Convert model to RKNN format
-python3 convert_model.py \
- --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
- --output tinyllama.rkllm
-
-# Run with NPU
-rkllm-server \
- --model tinyllama.rkllm \
- --port 8080
-```
-
-Expected speedup: **3-5x faster** than CPU only.
-
-## Model Download URLs
-
-### TinyLlama 1.1B (Recommended for 2GB)
-```bash
-wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
-```
-
-### Phi-2 2.7B (Recommended for 4GB)
-```bash
-wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf
-```
-
-### Gemma 2B
-```bash
-wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf
-```
-
-### Llama 3.2 3B (Recommended for 8GB)
-```bash
-wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
-```
-
-### Mistral 7B
-```bash
-wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
-```
-
-## API Usage
-
-llama.cpp exposes an OpenAI-compatible API:
-
-### Chat Completion
-
-```bash
-curl http://localhost:8080/v1/chat/completions \
- -H "Content-Type: application/json" \
- -d '{
- "model": "tinyllama",
- "messages": [
- {"role": "user", "content": "What is 2+2?"}
- ],
- "max_tokens": 100
- }'
-```
-
-### Streaming
-
-```bash
-curl http://localhost:8080/v1/chat/completions \
- -H "Content-Type: application/json" \
- -d '{
- "model": "tinyllama",
- "messages": [{"role": "user", "content": "Tell me a story"}],
- "stream": true
- }'
-```
-
-### Health Check
-
-```bash
-curl http://localhost:8080/health
-curl http://localhost:8080/v1/models
-```
-
-## Monitoring
-
-### Check Performance
-
-```bash
-# Watch resource usage
-htop
-
-# Check inference speed in logs
-sudo journalctl -u llama-server -f | grep "tokens/s"
-
-# Memory usage
-free -h
-```
-
-### Benchmarking
-
-```bash
-# Run llama.cpp benchmark
-/opt/llama.cpp/build/bin/llama-bench \
- -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
- -p 512 -n 128 -t 4
-```
-
-## Troubleshooting
-
-### Model Loading Fails
-
-```bash
-# Check available RAM
-free -h
-
-# Try smaller context
--c 512
-
-# Use memory mapping
---mmap
-```
-
-### Slow Inference
-
-```bash
-# Increase threads (up to CPU cores)
---threads $(nproc)
-
-# Use optimized build
-cmake .. -DLLAMA_NATIVE=ON
-
-# Consider smaller model
-```
-
-### Out of Memory Killer
-
-```bash
-# Check if OOM killed the process
-dmesg | grep -i "killed process"
-
-# Increase swap
-# Use smaller model
-# Reduce context size
-```
-
-## Best Practices
-
-1. **Start small** - Begin with TinyLlama, upgrade if needed
-2. **Monitor memory** - Use `htop` during initial tests
-3. **Set appropriate context** - 1024-2048 for most embedded use
-4. **Use quantized models** - Q4_K_M is a good balance
-5. **Enable streaming** - Better UX on slow inference
-6. **Test offline** - Verify it works without internet before deployment
+# Local LLM with llama.cpp
diff --git a/src/20-embedding/quick-start.md b/src/20-embedding/quick-start.md
index b7a15fb2..05cf8c1f 100644
--- a/src/20-embedding/quick-start.md
+++ b/src/20-embedding/quick-start.md
@@ -1,209 +1 @@
-# Quick Start - Deploy in 5 Minutes
-
-Get General Bots running on your embedded device with local AI in just a few commands.
-
-## Prerequisites
-
-- An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian
-- SSH access to the device
-- Internet connection (for initial setup only)
-
-## One-Line Deploy
-
-From your development machine:
-
-```bash
-# Clone and run the deployment script
-git clone https://github.com/GeneralBots/botserver.git
-cd botserver
-
-# Deploy to Orange Pi (replace with your device IP)
-./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama
-```
-
-That's it! After ~10-15 minutes:
-- BotServer runs on port 8088
-- llama.cpp runs on port 8080 with TinyLlama
-- Embedded UI available at `http://your-device:8088/embedded/`
-
-## Step-by-Step Guide
-
-### Step 1: Prepare Your Device
-
-Flash your SBC with a compatible OS:
-
-**Raspberry Pi:**
-```bash
-# Download Raspberry Pi Imager
-# Select: Raspberry Pi OS Lite (64-bit)
-# Enable SSH in settings
-```
-
-**Orange Pi:**
-```bash
-# Download Armbian from armbian.com
-# Flash with balenaEtcher
-```
-
-### Step 2: First Boot Configuration
-
-```bash
-# SSH into your device
-ssh pi@raspberrypi.local # or orangepi@orangepi.local
-
-# Update system
-sudo apt update && sudo apt upgrade -y
-
-# Set timezone
-sudo timedatectl set-timezone America/Sao_Paulo
-
-# Enable I2C/SPI if using GPIO displays
-sudo raspi-config # or armbian-config
-```
-
-### Step 3: Run Deployment Script
-
-From your development PC:
-
-```bash
-# Basic deployment (botserver only)
-./scripts/deploy-embedded.sh pi@raspberrypi.local
-
-# With embedded UI
-./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui
-
-# With local LLM (requires 4GB+ RAM)
-./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama
-
-# Specify a different model
-./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf
-```
-
-### Step 4: Verify Installation
-
-```bash
-# Check services
-ssh pi@raspberrypi.local 'sudo systemctl status botserver'
-ssh pi@raspberrypi.local 'sudo systemctl status llama-server'
-
-# Test botserver
-curl http://raspberrypi.local:8088/health
-
-# Test llama.cpp
-curl http://raspberrypi.local:8080/v1/models
-```
-
-### Step 5: Access the Interface
-
-Open in your browser:
-```
-http://raspberrypi.local:8088/embedded/
-```
-
-Or set up kiosk mode (auto-starts on boot):
-```bash
-# Already configured if you used --with-ui
-# Just reboot:
-ssh pi@raspberrypi.local 'sudo reboot'
-```
-
-## Local Installation (On the Device)
-
-If you prefer to install directly on the device:
-
-```bash
-# SSH into the device
-ssh pi@raspberrypi.local
-
-# Install Rust
-curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-source ~/.cargo/env
-
-# Clone and build
-git clone https://github.com/GeneralBots/botserver.git
-cd botserver
-
-# Run local deployment
-./scripts/deploy-embedded.sh --local --with-ui --with-llama
-```
-
-⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster.
-
-## Configuration
-
-After deployment, edit the config file:
-
-```bash
-ssh pi@raspberrypi.local
-sudo nano /opt/botserver/.env
-```
-
-Key settings:
-```env
-# Server
-HOST=0.0.0.0
-PORT=8088
-
-# Local LLM
-LLM_PROVIDER=llamacpp
-LLM_API_URL=http://127.0.0.1:8080
-LLM_MODEL=tinyllama
-
-# Memory limits for small devices
-MAX_CONTEXT_TOKENS=2048
-MAX_RESPONSE_TOKENS=512
-```
-
-Restart after changes:
-```bash
-sudo systemctl restart botserver
-```
-
-## Troubleshooting
-
-### Out of Memory
-
-```bash
-# Check memory usage
-free -h
-
-# Reduce llama.cpp context
-sudo nano /etc/systemd/system/llama-server.service
-# Change -c 2048 to -c 1024
-
-# Or use a smaller model
-# TinyLlama uses ~700MB, Phi-2 uses ~1.6GB
-```
-
-### Service Won't Start
-
-```bash
-# Check logs
-sudo journalctl -u botserver -f
-sudo journalctl -u llama-server -f
-
-# Common issues:
-# - Port already in use
-# - Missing model file
-# - Database permissions
-```
-
-### Display Not Working
-
-```bash
-# Check if display is detected
-ls /dev/fb* # HDMI/DSI
-ls /dev/i2c* # I2C displays
-ls /dev/spidev* # SPI displays
-
-# For HDMI, check config
-sudo nano /boot/config.txt # Raspberry Pi
-sudo nano /boot/armbianEnv.txt # Orange Pi
-```
-
-## Next Steps
-
-- [Embedded UI Guide](./embedded-ui.md) - Customize the interface
-- [Local LLM Configuration](./local-llm.md) - Optimize AI performance
-- [Kiosk Mode](./kiosk-mode.md) - Production deployment
-- [Offline Operation](./offline.md) - Disconnected environments
+# Quick Start
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
index 5c52ab4e..02663eba 100644
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@@ -320,9 +320,17 @@
- [Permissions Matrix](./12-auth/permissions-matrix.md)
- [User Context vs System Context](./12-auth/user-system-context.md)
-# Part XII - Community
+# Part XII - Device & Offline Deployment
-- [Chapter 13: Contributing](./13-community/README.md)
+- [Chapter 13: Device Deployment](./13-devices/README.md)
+ - [Mobile (Android & HarmonyOS)](./13-devices/mobile.md)
+ - [Supported Hardware (SBCs)](./13-devices/hardware.md)
+ - [Quick Start](./13-devices/quick-start.md)
+ - [Local LLM with llama.cpp](./13-devices/local-llm.md)
+
+# Part XIII - Community
+
+- [Chapter 14: Contributing](./13-community/README.md)
- [Development Setup](./13-community/setup.md)
- [Testing Guide](./13-community/testing.md)
- [Documentation](./13-community/documentation.md)
@@ -330,9 +338,9 @@
- [Community Guidelines](./13-community/community.md)
- [IDEs](./13-community/ide-extensions.md)
-# Part XIII - Migration
+# Part XIV - Migration
-- [Chapter 14: Migration Guide](./14-migration/README.md)
+- [Chapter 15: Migration Guide](./14-migration/README.md)
- [Migration Overview](./14-migration/overview.md)
- [Platform Comparison Matrix](./14-migration/comparison-matrix.md)
- [Migration Resources](./14-migration/resources.md)
@@ -350,9 +358,9 @@
- [Automation Migration](./14-migration/automation.md)
- [Validation and Testing](./14-migration/validation.md)
-# Part XIV - Testing
+# Part XV - Testing
-- [Chapter 17: Testing](./17-testing/README.md)
+- [Chapter 16: Testing](./17-testing/README.md)
- [End-to-End Testing](./17-testing/e2e-testing.md)
- [Testing Architecture](./17-testing/architecture.md)
- [Performance Testing](./17-testing/performance.md)
@@ -390,12 +398,5 @@
- [Appendix D: Documentation Style](./16-appendix-docs-style/conversation-examples.md)
- [SVG and Conversation Standards](./16-appendix-docs-style/svg.md)
-# Part XV - Embedded & Offline
-
-- [Chapter 20: Embedded Deployment](./20-embedding/README.md)
- - [Supported Hardware](./20-embedding/hardware.md)
- - [Quick Start](./20-embedding/quick-start.md)
- - [Local LLM with llama.cpp](./20-embedding/local-llm.md)
-
[Glossary](./glossary.md)
[Contact](./contact/README.md)