Add Chapter 20: Embedded & Offline Deployment - Complete guide for Raspberry Pi, Orange Pi with local LLM

2025-12-12 13:51:39 -03:00 · 2025-12-12 13:51:39 -03:00 · ff5d2ac12c
commit ff5d2ac12c
parent 3fdeeedf73
5 changed files with 835 additions and 0 deletions
--- a/src/20-embedding/README.md
+++ b/src/20-embedding/README.md
@ -0,0 +1,47 @@
+# Chapter 20: Embedded & Offline Deployment
+
+Deploy General Bots to any device - from Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities.
+
+## Overview
+
+General Bots can run on minimal hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere:
+
+- **Kiosks** - Self-service terminals in stores, airports, hospitals
+- **Industrial IoT** - Factory floor assistants, machine interfaces
+- **Smart Home** - Wall panels, kitchen displays, door intercoms
+- **Retail** - Point-of-sale systems, product information terminals
+- **Education** - Classroom assistants, lab equipment interfaces
+- **Healthcare** - Patient check-in, medication reminders
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         Embedded GB Architecture                             │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│    ┌──────────────┐     ┌──────────────┐     ┌──────────────┐              │
+│    │   Display    │     │  botserver   │     │  llama.cpp   │              │
+│    │  LCD/OLED    │────▶│   (Rust)     │────▶│   (Local)    │              │
+│    │   TFT/HDMI   │     │  Port 8088   │     │  Port 8080   │              │
+│    └──────────────┘     └──────────────┘     └──────────────┘              │
+│           │                    │                    │                       │
+│           │                    │                    │                       │
+│    ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐              │
+│    │  Keyboard   │     │   SQLite    │     │  TinyLlama  │              │
+│    │  Buttons    │     │   (Data)    │     │    GGUF     │              │
+│    │  Touch      │     │             │     │   (~700MB)  │              │
+│    └─────────────┘     └─────────────┘     └─────────────┘              │
+│                                                                              │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## What's in This Chapter
+
+- [Supported Hardware](./hardware.md) - Boards, displays, and peripherals
+- [Quick Start](./quick-start.md) - Deploy in 5 minutes
+- [Embedded UI](./embedded-ui.md) - Interface for small displays
+- [Local LLM](./local-llm.md) - Offline AI with llama.cpp
+- [Display Modes](./display-modes.md) - LCD, OLED, TFT, E-ink configurations
+- [Kiosk Mode](./kiosk-mode.md) - Locked-down production deployments
+- [Performance Tuning](./performance.md) - Optimize for limited resources
+- [Offline Operation](./offline.md) - No internet required
+- [Use Cases](./use-cases.md) - Real-world deployment examples
--- a/src/20-embedding/hardware.md
+++ b/src/20-embedding/hardware.md
@ -0,0 +1,190 @@
+# Supported Hardware
+
+## Single Board Computers (SBCs)
+
+### Recommended Boards
+
+| Board | CPU | RAM | Best For | Price |
+|-------|-----|-----|----------|-------|
+| **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 |
+| **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 |
+| **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 |
+| **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 |
+| **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 |
+| **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 |
+| **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 |
+| **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 |
+| **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 |
+| **ODROID-N2+** | S922X | 4GB | High performance | $79 |
+
+### Minimum Requirements
+
+**For UI only (connect to remote botserver):**
+- Any ARM/x86 Linux board
+- 256MB RAM
+- Network connection
+- Display output
+
+**For local botserver:**
+- ARM64 or x86_64
+- 1GB RAM minimum
+- 4GB storage
+
+**For local LLM (llama.cpp):**
+- ARM64 or x86_64
+- 2GB+ RAM (4GB recommended)
+- 2GB+ storage for model
+
+### Orange Pi 5 (Recommended for LLM)
+
+The Orange Pi 5 with RK3588S is ideal for embedded LLM:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Orange Pi 5 - Best for Offline AI                          │
+├─────────────────────────────────────────────────────────────┤
+│  CPU: Rockchip RK3588S (4x A76 + 4x A55)                   │
+│  NPU: 6 TOPS (Neural Processing Unit)                       │
+│  GPU: Mali-G610 MP4                                         │
+│  RAM: 4GB / 8GB / 16GB LPDDR4X                             │
+│  Storage: M.2 NVMe + eMMC + microSD                        │
+│                                                             │
+│  LLM Performance:                                           │
+│  ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec                    │
+│  ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec                         │
+│  └─ With NPU (rkllm): ~20-30 tokens/sec                    │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Displays
+
+### Character LCDs (Minimal)
+
+For text-only interfaces:
+
+| Display | Resolution | Interface | Use Case |
+|---------|------------|-----------|----------|
+| HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A |
+| HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context |
+| LCD2004 | 20 chars × 4 lines | I2C | Industrial |
+
+**Example output on 16x2:**
+```
+┌────────────────┐
+│> How can I help│
+│< Processing... │
+└────────────────┘
+```
+
+### OLED Displays
+
+For graphical monochrome interfaces:
+
+| Display | Resolution | Interface | Size |
+|---------|------------|-----------|------|
+| SSD1306 | 128×64 | I2C/SPI | 0.96" |
+| SSD1309 | 128×64 | I2C/SPI | 2.42" |
+| SH1106 | 128×64 | I2C/SPI | 1.3" |
+| SSD1322 | 256×64 | SPI | 3.12" |
+
+### TFT/IPS Color Displays
+
+For full graphical interface:
+
+| Display | Resolution | Interface | Notes |
+|---------|------------|-----------|-------|
+| ILI9341 | 320×240 | SPI | Common, cheap |
+| ST7789 | 240×320 | SPI | Fast refresh |
+| ILI9488 | 480×320 | SPI | Larger |
+| Waveshare 5" | 800×480 | HDMI | Touch optional |
+| Waveshare 7" | 1024×600 | HDMI | Touch, IPS |
+| Official Pi 7" | 800×480 | DSI | Best for Pi |
+
+### E-Ink/E-Paper
+
+For low-power, readable in sunlight:
+
+| Display | Resolution | Colors | Refresh |
+|---------|------------|--------|---------|
+| Waveshare 2.13" | 250×122 | B/W | 2s |
+| Waveshare 4.2" | 400×300 | B/W | 4s |
+| Waveshare 7.5" | 800×480 | B/W | 5s |
+| Good Display 9.7" | 1200×825 | B/W | 6s |
+
+**Best for:** Menu displays, signs, low-update applications
+
+### Industrial Displays
+
+| Display | Resolution | Features |
+|---------|------------|----------|
+| Advantech | Various | Wide temp, sunlight |
+| Winstar | Various | Industrial grade |
+| Newhaven | Various | Long availability |
+
+## Input Devices
+
+### Keyboards
+
+- **USB Keyboard** - Standard, any USB keyboard works
+- **PS/2 Keyboard** - Via adapter, lower latency
+- **Matrix Keypad** - 4x4 or 3x4, GPIO connected
+- **I2C Keypad** - Fewer GPIO pins needed
+
+### Touch Input
+
+- **Capacitive Touch** - Better response, needs driver
+- **Resistive Touch** - Works with gloves, pressure-based
+- **IR Touch Frame** - Large displays, vandal-resistant
+
+### Buttons & GPIO
+
+```
+┌─────────────────────────────────────────────┐
+│  Simple 4-Button Interface                   │
+├─────────────────────────────────────────────┤
+│                                              │
+│   [◄ PREV]  [▲ UP]  [▼ DOWN]  [► SELECT]   │
+│                                              │
+│   GPIO 17   GPIO 27  GPIO 22   GPIO 23      │
+│                                              │
+└─────────────────────────────────────────────┘
+```
+
+## Enclosures
+
+### Commercial Options
+
+- **Hammond Manufacturing** - Industrial metal enclosures
+- **Polycase** - Plastic, IP65 rated
+- **Bud Industries** - Various sizes
+- **Pi-specific cases** - Argon, Flirc, etc.
+
+### DIY Options
+
+- **3D Printed** - Custom fit, PLA/PETG
+- **Laser Cut** - Acrylic, wood
+- **Metal Fabrication** - Professional look
+
+## Power
+
+### Power Requirements
+
+| Configuration | Power | Recommended PSU |
+|---------------|-------|-----------------|
+| Pi Zero + LCD | 1-2W | 5V 1A |
+| Pi 4 + Display | 5-10W | 5V 3A |
+| Orange Pi 5 | 8-15W | 5V 4A or 12V 2A |
+| With NVMe SSD | +2-3W | Add 1A headroom |
+
+### Power Options
+
+- **USB-C PD** - Modern, efficient
+- **PoE HAT** - Power over Ethernet
+- **12V Barrel** - Industrial standard
+- **Battery** - UPS, solar applications
+
+### UPS Solutions
+
+- **PiJuice** - Pi-specific UPS HAT
+- **UPS PIco** - Small form factor
+- **Powerboost** - Adafruit, lithium battery
--- a/src/20-embedding/local-llm.md
+++ b/src/20-embedding/local-llm.md
@ -0,0 +1,382 @@
+# Local LLM - Offline AI with llama.cpp
+
+Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy.
+
+## Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        Local LLM Architecture                                │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│   User Input ──▶ botserver ──▶ llama.cpp ──▶ Response                       │
+│                      │              │                                        │
+│                      │         ┌────┴────┐                                   │
+│                      │         │  Model  │                                   │
+│                      │         │  GGUF   │                                   │
+│                      │         │ (Q4_K)  │                                   │
+│                      │         └─────────┘                                   │
+│                      │                                                       │
+│                 SQLite DB                                                    │
+│                (sessions)                                                    │
+│                                                                              │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Recommended Models
+
+### By Device RAM
+
+| RAM | Model | Size | Speed | Quality |
+|-----|-------|------|-------|---------|
+| **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic |
+| **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good |
+| **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good |
+| **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better |
+| **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great |
+| **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent |
+
+### By Use Case
+
+**Simple Q&A, Commands:**
+```
+TinyLlama 1.1B - Fast, basic understanding
+```
+
+**Customer Service, FAQ:**
+```
+Phi-2 or Gemma 2B - Good comprehension, reasonable speed
+```
+
+**Complex Reasoning:**
+```
+Llama 3.2 3B or Mistral 7B - Better accuracy, slower
+```
+
+## Installation
+
+### Automatic (via deploy script)
+
+```bash
+./scripts/deploy-embedded.sh pi@device --with-llama
+```
+
+### Manual Installation
+
+```bash
+# SSH to device
+ssh pi@raspberrypi.local
+
+# Install dependencies
+sudo apt update
+sudo apt install -y build-essential cmake git wget
+
+# Clone llama.cpp
+cd /opt
+sudo git clone https://github.com/ggerganov/llama.cpp
+sudo chown -R $(whoami):$(whoami) llama.cpp
+cd llama.cpp
+
+# Build for ARM (auto-optimizes)
+mkdir build && cd build
+cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release
+make -j$(nproc)
+
+# Download model
+mkdir -p /opt/llama.cpp/models
+cd /opt/llama.cpp/models
+wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
+```
+
+### Start Server
+
+```bash
+# Test run
+/opt/llama.cpp/build/bin/llama-server \
+    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+    --host 0.0.0.0 \
+    --port 8080 \
+    -c 2048 \
+    --threads 4
+
+# Verify
+curl http://localhost:8080/v1/models
+```
+
+### Systemd Service
+
+Create `/etc/systemd/system/llama-server.service`:
+
+```ini
+[Unit]
+Description=llama.cpp Server - Local LLM
+After=network.target
+
+[Service]
+Type=simple
+User=root
+WorkingDirectory=/opt/llama.cpp
+ExecStart=/opt/llama.cpp/build/bin/llama-server \
+    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+    --host 0.0.0.0 \
+    --port 8080 \
+    -c 2048 \
+    -ngl 0 \
+    --threads 4
+Restart=always
+RestartSec=5
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start:
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable llama-server
+sudo systemctl start llama-server
+```
+
+## Configuration
+
+### botserver .env
+
+```env
+# Use local llama.cpp
+LLM_PROVIDER=llamacpp
+LLM_API_URL=http://127.0.0.1:8080
+LLM_MODEL=tinyllama
+
+# Memory limits
+MAX_CONTEXT_TOKENS=2048
+MAX_RESPONSE_TOKENS=512
+STREAMING_ENABLED=true
+```
+
+### llama.cpp Parameters
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `-c` | 2048 | Context size (tokens) |
+| `--threads` | 4 | CPU threads |
+| `-ngl` | 0 | GPU layers (0 for CPU only) |
+| `--host` | 127.0.0.1 | Bind address |
+| `--port` | 8080 | Server port |
+| `-b` | 512 | Batch size |
+| `--mlock` | off | Lock model in RAM |
+
+### Memory vs Context Size
+
+```
+Context 512:  ~400MB RAM, fast, limited conversation
+Context 1024: ~600MB RAM, moderate
+Context 2048: ~900MB RAM, good for most uses
+Context 4096: ~1.5GB RAM, long conversations
+```
+
+## Performance Optimization
+
+### CPU Optimization
+
+```bash
+# Check CPU features
+cat /proc/cpuinfo | grep -E "(model name|Features)"
+
+# Build with specific optimizations
+cmake .. -DLLAMA_NATIVE=ON \
+         -DCMAKE_BUILD_TYPE=Release \
+         -DLLAMA_ARM_FMA=ON \
+         -DLLAMA_ARM_DOTPROD=ON
+```
+
+### Memory Optimization
+
+```bash
+# For 2GB RAM devices
+# Use smaller context
+-c 1024
+
+# Use memory mapping (slower but less RAM)
+--mmap
+
+# Disable mlock (don't pin to RAM)
+# (default is disabled)
+```
+
+### Swap Configuration
+
+For devices with limited RAM:
+
+```bash
+# Create 2GB swap
+sudo fallocate -l 2G /swapfile
+sudo chmod 600 /swapfile
+sudo mkswap /swapfile
+sudo swapon /swapfile
+
+# Make permanent
+echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
+
+# Optimize swap usage
+echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
+```
+
+## NPU Acceleration (Orange Pi 5)
+
+Orange Pi 5 has a 6 TOPS NPU that can accelerate inference:
+
+### Using rkllm (Rockchip NPU)
+
+```bash
+# Install rkllm runtime
+git clone https://github.com/airockchip/rknn-llm
+cd rknn-llm
+./install.sh
+
+# Convert model to RKNN format
+python3 convert_model.py \
+    --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+    --output tinyllama.rkllm
+
+# Run with NPU
+rkllm-server \
+    --model tinyllama.rkllm \
+    --port 8080
+```
+
+Expected speedup: **3-5x faster** than CPU only.
+
+## Model Download URLs
+
+### TinyLlama 1.1B (Recommended for 2GB)
+```bash
+wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
+```
+
+### Phi-2 2.7B (Recommended for 4GB)
+```bash
+wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf
+```
+
+### Gemma 2B
+```bash
+wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf
+```
+
+### Llama 3.2 3B (Recommended for 8GB)
+```bash
+wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
+```
+
+### Mistral 7B
+```bash
+wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
+```
+
+## API Usage
+
+llama.cpp exposes an OpenAI-compatible API:
+
+### Chat Completion
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "tinyllama",
+    "messages": [
+      {"role": "user", "content": "What is 2+2?"}
+    ],
+    "max_tokens": 100
+  }'
+```
+
+### Streaming
+
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "tinyllama",
+    "messages": [{"role": "user", "content": "Tell me a story"}],
+    "stream": true
+  }'
+```
+
+### Health Check
+
+```bash
+curl http://localhost:8080/health
+curl http://localhost:8080/v1/models
+```
+
+## Monitoring
+
+### Check Performance
+
+```bash
+# Watch resource usage
+htop
+
+# Check inference speed in logs
+sudo journalctl -u llama-server -f | grep "tokens/s"
+
+# Memory usage
+free -h
+```
+
+### Benchmarking
+
+```bash
+# Run llama.cpp benchmark
+/opt/llama.cpp/build/bin/llama-bench \
+    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
+    -p 512 -n 128 -t 4
+```
+
+## Troubleshooting
+
+### Model Loading Fails
+
+```bash
+# Check available RAM
+free -h
+
+# Try smaller context
+-c 512
+
+# Use memory mapping
+--mmap
+```
+
+### Slow Inference
+
+```bash
+# Increase threads (up to CPU cores)
+--threads $(nproc)
+
+# Use optimized build
+cmake .. -DLLAMA_NATIVE=ON
+
+# Consider smaller model
+```
+
+### Out of Memory Killer
+
+```bash
+# Check if OOM killed the process
+dmesg | grep -i "killed process"
+
+# Increase swap
+# Use smaller model
+# Reduce context size
+```
+
+## Best Practices
+
+1. **Start small** - Begin with TinyLlama, upgrade if needed
+2. **Monitor memory** - Use `htop` during initial tests
+3. **Set appropriate context** - 1024-2048 for most embedded use
+4. **Use quantized models** - Q4_K_M is a good balance
+5. **Enable streaming** - Better UX on slow inference
+6. **Test offline** - Verify it works without internet before deployment
--- a/src/20-embedding/quick-start.md
+++ b/src/20-embedding/quick-start.md
@ -0,0 +1,209 @@
+# Quick Start - Deploy in 5 Minutes
+
+Get General Bots running on your embedded device with local AI in just a few commands.
+
+## Prerequisites
+
+- An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian
+- SSH access to the device
+- Internet connection (for initial setup only)
+
+## One-Line Deploy
+
+From your development machine:
+
+```bash
+# Clone and run the deployment script
+git clone https://github.com/GeneralBots/botserver.git
+cd botserver
+
+# Deploy to Orange Pi (replace with your device IP)
+./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama
+```
+
+That's it! After ~10-15 minutes:
+- BotServer runs on port 8088
+- llama.cpp runs on port 8080 with TinyLlama
+- Embedded UI available at `http://your-device:8088/embedded/`
+
+## Step-by-Step Guide
+
+### Step 1: Prepare Your Device
+
+Flash your SBC with a compatible OS:
+
+**Raspberry Pi:**
+```bash
+# Download Raspberry Pi Imager
+# Select: Raspberry Pi OS Lite (64-bit)
+# Enable SSH in settings
+```
+
+**Orange Pi:**
+```bash
+# Download Armbian from armbian.com
+# Flash with balenaEtcher
+```
+
+### Step 2: First Boot Configuration
+
+```bash
+# SSH into your device
+ssh pi@raspberrypi.local  # or orangepi@orangepi.local
+
+# Update system
+sudo apt update && sudo apt upgrade -y
+
+# Set timezone
+sudo timedatectl set-timezone America/Sao_Paulo
+
+# Enable I2C/SPI if using GPIO displays
+sudo raspi-config  # or armbian-config
+```
+
+### Step 3: Run Deployment Script
+
+From your development PC:
+
+```bash
+# Basic deployment (botserver only)
+./scripts/deploy-embedded.sh pi@raspberrypi.local
+
+# With embedded UI
+./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui
+
+# With local LLM (requires 4GB+ RAM)
+./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama
+
+# Specify a different model
+./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf
+```
+
+### Step 4: Verify Installation
+
+```bash
+# Check services
+ssh pi@raspberrypi.local 'sudo systemctl status botserver'
+ssh pi@raspberrypi.local 'sudo systemctl status llama-server'
+
+# Test botserver
+curl http://raspberrypi.local:8088/health
+
+# Test llama.cpp
+curl http://raspberrypi.local:8080/v1/models
+```
+
+### Step 5: Access the Interface
+
+Open in your browser:
+```
+http://raspberrypi.local:8088/embedded/
+```
+
+Or set up kiosk mode (auto-starts on boot):
+```bash
+# Already configured if you used --with-ui
+# Just reboot:
+ssh pi@raspberrypi.local 'sudo reboot'
+```
+
+## Local Installation (On the Device)
+
+If you prefer to install directly on the device:
+
+```bash
+# SSH into the device
+ssh pi@raspberrypi.local
+
+# Install Rust
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+source ~/.cargo/env
+
+# Clone and build
+git clone https://github.com/GeneralBots/botserver.git
+cd botserver
+
+# Run local deployment
+./scripts/deploy-embedded.sh --local --with-ui --with-llama
+```
+
+⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster.
+
+## Configuration
+
+After deployment, edit the config file:
+
+```bash
+ssh pi@raspberrypi.local
+sudo nano /opt/botserver/.env
+```
+
+Key settings:
+```env
+# Server
+HOST=0.0.0.0
+PORT=8088
+
+# Local LLM
+LLM_PROVIDER=llamacpp
+LLM_API_URL=http://127.0.0.1:8080
+LLM_MODEL=tinyllama
+
+# Memory limits for small devices
+MAX_CONTEXT_TOKENS=2048
+MAX_RESPONSE_TOKENS=512
+```
+
+Restart after changes:
+```bash
+sudo systemctl restart botserver
+```
+
+## Troubleshooting
+
+### Out of Memory
+
+```bash
+# Check memory usage
+free -h
+
+# Reduce llama.cpp context
+sudo nano /etc/systemd/system/llama-server.service
+# Change -c 2048 to -c 1024
+
+# Or use a smaller model
+# TinyLlama uses ~700MB, Phi-2 uses ~1.6GB
+```
+
+### Service Won't Start
+
+```bash
+# Check logs
+sudo journalctl -u botserver -f
+sudo journalctl -u llama-server -f
+
+# Common issues:
+# - Port already in use
+# - Missing model file
+# - Database permissions
+```
+
+### Display Not Working
+
+```bash
+# Check if display is detected
+ls /dev/fb*       # HDMI/DSI
+ls /dev/i2c*      # I2C displays
+ls /dev/spidev*   # SPI displays
+
+# For HDMI, check config
+sudo nano /boot/config.txt  # Raspberry Pi
+sudo nano /boot/armbianEnv.txt  # Orange Pi
+```
+
+## Next Steps
+
+- [Embedded UI Guide](./embedded-ui.md) - Customize the interface
+- [Local LLM Configuration](./local-llm.md) - Optimize AI performance
+- [Kiosk Mode](./kiosk-mode.md) - Production deployment
+- [Offline Operation](./offline.md) - Disconnected environments
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -390,5 +390,12 @@
 - [Appendix D: Documentation Style](./16-appendix-docs-style/conversation-examples.md)
  - [SVG and Conversation Standards](./16-appendix-docs-style/svg.md)

+# Part XV - Embedded & Offline
+
+- [Chapter 20: Embedded Deployment](./20-embedding/README.md)
+  - [Supported Hardware](./20-embedding/hardware.md)
+  - [Quick Start](./20-embedding/quick-start.md)
+  - [Local LLM with llama.cpp](./20-embedding/local-llm.md)
+
 [Glossary](./glossary.md)
 [Contact](./contact/README.md)