Add Chapter 20: Embedded & Offline Deployment - Complete guide for Raspberry Pi, Orange Pi with local LLM

2025-12-12 13:51:39 -03:00 · 2025-12-12 13:51:39 -03:00 · ff5d2ac12c
commit ff5d2ac12c
parent 3fdeeedf73
5 changed files with 835 additions and 0 deletions
--- a/src/20-embedding/README.md
+++ b/src/20-embedding/README.md
@ -0,0 +1,47 @@
 # Chapter 20: Embedded & Offline Deployment
 Deploy General Bots to any device - from Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities.
 ## Overview
 General Bots can run on minimal hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere:
 - **Kiosks** - Self-service terminals in stores, airports, hospitals
 - **Industrial IoT** - Factory floor assistants, machine interfaces
 - **Smart Home** - Wall panels, kitchen displays, door intercoms
 - **Retail** - Point-of-sale systems, product information terminals
 - **Education** - Classroom assistants, lab equipment interfaces
 - **Healthcare** - Patient check-in, medication reminders
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                         Embedded GB Architecture                             │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                              │
 │    ┌──────────────┐     ┌──────────────┐     ┌──────────────┐              │
 │    │   Display    │     │  botserver   │     │  llama.cpp   │              │
 │    │  LCD/OLED    │────▶│   (Rust)     │────▶│   (Local)    │              │
 │    │   TFT/HDMI   │     │  Port 8088   │     │  Port 8080   │              │
 │    └──────────────┘     └──────────────┘     └──────────────┘              │
 │           │                    │                    │                       │
 │           │                    │                    │                       │
 │    ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐              │
 │    │  Keyboard   │     │   SQLite    │     │  TinyLlama  │              │
 │    │  Buttons    │     │   (Data)    │     │    GGUF     │              │
 │    │  Touch      │     │             │     │   (~700MB)  │              │
 │    └─────────────┘     └─────────────┘     └─────────────┘              │
 │                                                                              │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ## What's in This Chapter
 - [Supported Hardware](./hardware.md) - Boards, displays, and peripherals
 - [Quick Start](./quick-start.md) - Deploy in 5 minutes
 - [Embedded UI](./embedded-ui.md) - Interface for small displays
 - [Local LLM](./local-llm.md) - Offline AI with llama.cpp
 - [Display Modes](./display-modes.md) - LCD, OLED, TFT, E-ink configurations
 - [Kiosk Mode](./kiosk-mode.md) - Locked-down production deployments
 - [Performance Tuning](./performance.md) - Optimize for limited resources
 - [Offline Operation](./offline.md) - No internet required
 - [Use Cases](./use-cases.md) - Real-world deployment examples
--- a/src/20-embedding/hardware.md
+++ b/src/20-embedding/hardware.md
@ -0,0 +1,190 @@
 # Supported Hardware
 ## Single Board Computers (SBCs)
 ### Recommended Boards
 | Board | CPU | RAM | Best For | Price |
 |-------|-----|-----|----------|-------|
 | **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 |
 | **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 |
 | **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 |
 | **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 |
 | **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 |
 | **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 |
 | **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 |
 | **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 |
 | **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 |
 | **ODROID-N2+** | S922X | 4GB | High performance | $79 |
 ### Minimum Requirements
 **For UI only (connect to remote botserver):**
 - Any ARM/x86 Linux board
 - 256MB RAM
 - Network connection
 - Display output
 **For local botserver:**
 - ARM64 or x86_64
 - 1GB RAM minimum
 - 4GB storage
 **For local LLM (llama.cpp):**
 - ARM64 or x86_64
 - 2GB+ RAM (4GB recommended)
 - 2GB+ storage for model
 ### Orange Pi 5 (Recommended for LLM)
 The Orange Pi 5 with RK3588S is ideal for embedded LLM:
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │  Orange Pi 5 - Best for Offline AI                          │
 ├─────────────────────────────────────────────────────────────┤
 │  CPU: Rockchip RK3588S (4x A76 + 4x A55)                   │
 │  NPU: 6 TOPS (Neural Processing Unit)                       │
 │  GPU: Mali-G610 MP4                                         │
 │  RAM: 4GB / 8GB / 16GB LPDDR4X                             │
 │  Storage: M.2 NVMe + eMMC + microSD                        │
 │                                                             │
 │  LLM Performance:                                           │
 │  ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec                    │
 │  ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec                         │
 │  └─ With NPU (rkllm): ~20-30 tokens/sec                    │
 └─────────────────────────────────────────────────────────────┘
 ```
 ## Displays
 ### Character LCDs (Minimal)
 For text-only interfaces:
 | Display | Resolution | Interface | Use Case |
 |---------|------------|-----------|----------|
 | HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A |
 | HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context |
 | LCD2004 | 20 chars × 4 lines | I2C | Industrial |
 **Example output on 16x2:**
 ```
 ┌────────────────┐
 │> How can I help│
 │< Processing... │
 └────────────────┘
 ```
 ### OLED Displays
 For graphical monochrome interfaces:
 | Display | Resolution | Interface | Size |
 |---------|------------|-----------|------|
 | SSD1306 | 128×64 | I2C/SPI | 0.96" |
 | SSD1309 | 128×64 | I2C/SPI | 2.42" |
 | SH1106 | 128×64 | I2C/SPI | 1.3" |
 | SSD1322 | 256×64 | SPI | 3.12" |
 ### TFT/IPS Color Displays
 For full graphical interface:
 | Display | Resolution | Interface | Notes |
 |---------|------------|-----------|-------|
 | ILI9341 | 320×240 | SPI | Common, cheap |
 | ST7789 | 240×320 | SPI | Fast refresh |
 | ILI9488 | 480×320 | SPI | Larger |
 | Waveshare 5" | 800×480 | HDMI | Touch optional |
 | Waveshare 7" | 1024×600 | HDMI | Touch, IPS |
 | Official Pi 7" | 800×480 | DSI | Best for Pi |
 ### E-Ink/E-Paper
 For low-power, readable in sunlight:
 | Display | Resolution | Colors | Refresh |
 |---------|------------|--------|---------|
 | Waveshare 2.13" | 250×122 | B/W | 2s |
 | Waveshare 4.2" | 400×300 | B/W | 4s |
 | Waveshare 7.5" | 800×480 | B/W | 5s |
 | Good Display 9.7" | 1200×825 | B/W | 6s |
 **Best for:** Menu displays, signs, low-update applications
 ### Industrial Displays
 | Display | Resolution | Features |
 |---------|------------|----------|
 | Advantech | Various | Wide temp, sunlight |
 | Winstar | Various | Industrial grade |
 | Newhaven | Various | Long availability |
 ## Input Devices
 ### Keyboards
 - **USB Keyboard** - Standard, any USB keyboard works
 - **PS/2 Keyboard** - Via adapter, lower latency
 - **Matrix Keypad** - 4x4 or 3x4, GPIO connected
 - **I2C Keypad** - Fewer GPIO pins needed
 ### Touch Input
 - **Capacitive Touch** - Better response, needs driver
 - **Resistive Touch** - Works with gloves, pressure-based
 - **IR Touch Frame** - Large displays, vandal-resistant
 ### Buttons & GPIO
 ```
 ┌─────────────────────────────────────────────┐
 │  Simple 4-Button Interface                   │
 ├─────────────────────────────────────────────┤
 │                                              │
 │   [◄ PREV]  [▲ UP]  [▼ DOWN]  [► SELECT]   │
 │                                              │
 │   GPIO 17   GPIO 27  GPIO 22   GPIO 23      │
 │                                              │
 └─────────────────────────────────────────────┘
 ```
 ## Enclosures
 ### Commercial Options
 - **Hammond Manufacturing** - Industrial metal enclosures
 - **Polycase** - Plastic, IP65 rated
 - **Bud Industries** - Various sizes
 - **Pi-specific cases** - Argon, Flirc, etc.
 ### DIY Options
 - **3D Printed** - Custom fit, PLA/PETG
 - **Laser Cut** - Acrylic, wood
 - **Metal Fabrication** - Professional look
 ## Power
 ### Power Requirements
 | Configuration | Power | Recommended PSU |
 |---------------|-------|-----------------|
 | Pi Zero + LCD | 1-2W | 5V 1A |
 | Pi 4 + Display | 5-10W | 5V 3A |
 | Orange Pi 5 | 8-15W | 5V 4A or 12V 2A |
 | With NVMe SSD | +2-3W | Add 1A headroom |
 ### Power Options
 - **USB-C PD** - Modern, efficient
 - **PoE HAT** - Power over Ethernet
 - **12V Barrel** - Industrial standard
 - **Battery** - UPS, solar applications
 ### UPS Solutions
 - **PiJuice** - Pi-specific UPS HAT
 - **UPS PIco** - Small form factor
 - **Powerboost** - Adafruit, lithium battery
--- a/src/20-embedding/local-llm.md
+++ b/src/20-embedding/local-llm.md
@ -0,0 +1,382 @@
 # Local LLM - Offline AI with llama.cpp
 Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy.
 ## Overview
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                        Local LLM Architecture                                │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                              │
 │   User Input ──▶ botserver ──▶ llama.cpp ──▶ Response                       │
 │                      │              │                                        │
 │                      │         ┌────┴────┐                                   │
 │                      │         │  Model  │                                   │
 │                      │         │  GGUF   │                                   │
 │                      │         │ (Q4_K)  │                                   │
 │                      │         └─────────┘                                   │
 │                      │                                                       │
 │                 SQLite DB                                                    │
 │                (sessions)                                                    │
 │                                                                              │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ## Recommended Models
 ### By Device RAM
 | RAM | Model | Size | Speed | Quality |
 |-----|-------|------|-------|---------|
 | **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic |
 | **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good |
 | **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good |
 | **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better |
 | **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great |
 | **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent |
 ### By Use Case
 **Simple Q&A, Commands:**
 ```
 TinyLlama 1.1B - Fast, basic understanding
 ```
 **Customer Service, FAQ:**
 ```
 Phi-2 or Gemma 2B - Good comprehension, reasonable speed
 ```
 **Complex Reasoning:**
 ```
 Llama 3.2 3B or Mistral 7B - Better accuracy, slower
 ```
 ## Installation
 ### Automatic (via deploy script)
 ```bash
 ./scripts/deploy-embedded.sh pi@device --with-llama
 ```
 ### Manual Installation
 ```bash
 # SSH to device
 ssh pi@raspberrypi.local
 # Install dependencies
 sudo apt update
 sudo apt install -y build-essential cmake git wget
 # Clone llama.cpp
 cd /opt
 sudo git clone https://github.com/ggerganov/llama.cpp
 sudo chown -R $(whoami):$(whoami) llama.cpp
 cd llama.cpp
 # Build for ARM (auto-optimizes)
 mkdir build && cd build
 cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release
 make -j$(nproc)
 # Download model
 mkdir -p /opt/llama.cpp/models
 cd /opt/llama.cpp/models
 wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
 ```
 ### Start Server
 ```bash
 # Test run
 /opt/llama.cpp/build/bin/llama-server \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -c 2048 \
    --threads 4
 # Verify
 curl http://localhost:8080/v1/models
 ```
 ### Systemd Service
 Create `/etc/systemd/system/llama-server.service`:
 ```ini
 [Unit]
 Description=llama.cpp Server - Local LLM
 After=network.target
 [Service]
 Type=simple
 User=root
 WorkingDirectory=/opt/llama.cpp
 ExecStart=/opt/llama.cpp/build/bin/llama-server \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -c 2048 \
    -ngl 0 \
    --threads 4
 Restart=always
 RestartSec=5
 [Install]
 WantedBy=multi-user.target
 ```
 Enable and start:
 ```bash
 sudo systemctl daemon-reload
 sudo systemctl enable llama-server
 sudo systemctl start llama-server
 ```
 ## Configuration
 ### botserver .env
 ```env
 # Use local llama.cpp
 LLM_PROVIDER=llamacpp
 LLM_API_URL=http://127.0.0.1:8080
 LLM_MODEL=tinyllama
 # Memory limits
 MAX_CONTEXT_TOKENS=2048
 MAX_RESPONSE_TOKENS=512
 STREAMING_ENABLED=true
 ```
 ### llama.cpp Parameters
 | Parameter | Default | Description |
 |-----------|---------|-------------|
 | `-c` | 2048 | Context size (tokens) |
 | `--threads` | 4 | CPU threads |
 | `-ngl` | 0 | GPU layers (0 for CPU only) |
 | `--host` | 127.0.0.1 | Bind address |
 | `--port` | 8080 | Server port |
 | `-b` | 512 | Batch size |
 | `--mlock` | off | Lock model in RAM |
 ### Memory vs Context Size
 ```
 Context 512:  ~400MB RAM, fast, limited conversation
 Context 1024: ~600MB RAM, moderate
 Context 2048: ~900MB RAM, good for most uses
 Context 4096: ~1.5GB RAM, long conversations
 ```
 ## Performance Optimization
 ### CPU Optimization
 ```bash
 # Check CPU features
 cat /proc/cpuinfo | grep -E "(model name|Features)"
 # Build with specific optimizations
 cmake .. -DLLAMA_NATIVE=ON \
         -DCMAKE_BUILD_TYPE=Release \
         -DLLAMA_ARM_FMA=ON \
         -DLLAMA_ARM_DOTPROD=ON
 ```
 ### Memory Optimization
 ```bash
 # For 2GB RAM devices
 # Use smaller context
 -c 1024
 # Use memory mapping (slower but less RAM)
 --mmap
 # Disable mlock (don't pin to RAM)
 # (default is disabled)
 ```
 ### Swap Configuration
 For devices with limited RAM:
 ```bash
 # Create 2GB swap
 sudo fallocate -l 2G /swapfile
 sudo chmod 600 /swapfile
 sudo mkswap /swapfile
 sudo swapon /swapfile
 # Make permanent
 echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
 # Optimize swap usage
 echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
 ```
 ## NPU Acceleration (Orange Pi 5)
 Orange Pi 5 has a 6 TOPS NPU that can accelerate inference:
 ### Using rkllm (Rockchip NPU)
 ```bash
 # Install rkllm runtime
 git clone https://github.com/airockchip/rknn-llm
 cd rknn-llm
 ./install.sh
 # Convert model to RKNN format
 python3 convert_model.py \
    --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    --output tinyllama.rkllm
 # Run with NPU
 rkllm-server \
    --model tinyllama.rkllm \
    --port 8080
 ```
 Expected speedup: **3-5x faster** than CPU only.
 ## Model Download URLs
 ### TinyLlama 1.1B (Recommended for 2GB)
 ```bash
 wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
 ```
 ### Phi-2 2.7B (Recommended for 4GB)
 ```bash
 wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf
 ```
 ### Gemma 2B
 ```bash
 wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf
 ```
 ### Llama 3.2 3B (Recommended for 8GB)
 ```bash
 wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
 ```
 ### Mistral 7B
 ```bash
 wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
 ```
 ## API Usage
 llama.cpp exposes an OpenAI-compatible API:
 ### Chat Completion
 ```bash
 curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "max_tokens": 100
  }'
 ```
 ### Streaming
 ```bash
 curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'
 ```
 ### Health Check
 ```bash
 curl http://localhost:8080/health
 curl http://localhost:8080/v1/models
 ```
 ## Monitoring
 ### Check Performance
 ```bash
 # Watch resource usage
 htop
 # Check inference speed in logs
 sudo journalctl -u llama-server -f | grep "tokens/s"
 # Memory usage
 free -h
 ```
 ### Benchmarking
 ```bash
 # Run llama.cpp benchmark
 /opt/llama.cpp/build/bin/llama-bench \
    -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    -p 512 -n 128 -t 4
 ```
 ## Troubleshooting
 ### Model Loading Fails
 ```bash
 # Check available RAM
 free -h
 # Try smaller context
 -c 512
 # Use memory mapping
 --mmap
 ```
 ### Slow Inference
 ```bash
 # Increase threads (up to CPU cores)
 --threads $(nproc)
 # Use optimized build
 cmake .. -DLLAMA_NATIVE=ON
 # Consider smaller model
 ```
 ### Out of Memory Killer
 ```bash
 # Check if OOM killed the process
 dmesg | grep -i "killed process"
 # Increase swap
 # Use smaller model
 # Reduce context size
 ```
 ## Best Practices
 1. **Start small** - Begin with TinyLlama, upgrade if needed
 2. **Monitor memory** - Use `htop` during initial tests
 3. **Set appropriate context** - 1024-2048 for most embedded use
 4. **Use quantized models** - Q4_K_M is a good balance
 5. **Enable streaming** - Better UX on slow inference
 6. **Test offline** - Verify it works without internet before deployment
--- a/src/20-embedding/quick-start.md
+++ b/src/20-embedding/quick-start.md
@ -0,0 +1,209 @@
 # Quick Start - Deploy in 5 Minutes
 Get General Bots running on your embedded device with local AI in just a few commands.
 ## Prerequisites
 - An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian
 - SSH access to the device
 - Internet connection (for initial setup only)
 ## One-Line Deploy
 From your development machine:
 ```bash
 # Clone and run the deployment script
 git clone https://github.com/GeneralBots/botserver.git
 cd botserver
 # Deploy to Orange Pi (replace with your device IP)
 ./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama
 ```
 That's it! After ~10-15 minutes:
 - BotServer runs on port 8088
 - llama.cpp runs on port 8080 with TinyLlama
 - Embedded UI available at `http://your-device:8088/embedded/`
 ## Step-by-Step Guide
 ### Step 1: Prepare Your Device
 Flash your SBC with a compatible OS:
 **Raspberry Pi:**
 ```bash
 # Download Raspberry Pi Imager
 # Select: Raspberry Pi OS Lite (64-bit)
 # Enable SSH in settings
 ```
 **Orange Pi:**
 ```bash
 # Download Armbian from armbian.com
 # Flash with balenaEtcher
 ```
 ### Step 2: First Boot Configuration
 ```bash
 # SSH into your device
 ssh pi@raspberrypi.local  # or orangepi@orangepi.local
 # Update system
 sudo apt update && sudo apt upgrade -y
 # Set timezone
 sudo timedatectl set-timezone America/Sao_Paulo
 # Enable I2C/SPI if using GPIO displays
 sudo raspi-config  # or armbian-config
 ```
 ### Step 3: Run Deployment Script
 From your development PC:
 ```bash
 # Basic deployment (botserver only)
 ./scripts/deploy-embedded.sh pi@raspberrypi.local
 # With embedded UI
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui
 # With local LLM (requires 4GB+ RAM)
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama
 # Specify a different model
 ./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf
 ```
 ### Step 4: Verify Installation
 ```bash
 # Check services
 ssh pi@raspberrypi.local 'sudo systemctl status botserver'
 ssh pi@raspberrypi.local 'sudo systemctl status llama-server'
 # Test botserver
 curl http://raspberrypi.local:8088/health
 # Test llama.cpp
 curl http://raspberrypi.local:8080/v1/models
 ```
 ### Step 5: Access the Interface
 Open in your browser:
 ```
 http://raspberrypi.local:8088/embedded/
 ```
 Or set up kiosk mode (auto-starts on boot):
 ```bash
 # Already configured if you used --with-ui
 # Just reboot:
 ssh pi@raspberrypi.local 'sudo reboot'
 ```
 ## Local Installation (On the Device)
 If you prefer to install directly on the device:
 ```bash
 # SSH into the device
 ssh pi@raspberrypi.local
 # Install Rust
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 source ~/.cargo/env
 # Clone and build
 git clone https://github.com/GeneralBots/botserver.git
 cd botserver
 # Run local deployment
 ./scripts/deploy-embedded.sh --local --with-ui --with-llama
 ```
 ⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster.
 ## Configuration
 After deployment, edit the config file:
 ```bash
 ssh pi@raspberrypi.local
 sudo nano /opt/botserver/.env
 ```
 Key settings:
 ```env
 # Server
 HOST=0.0.0.0
 PORT=8088
 # Local LLM
 LLM_PROVIDER=llamacpp
 LLM_API_URL=http://127.0.0.1:8080
 LLM_MODEL=tinyllama
 # Memory limits for small devices
 MAX_CONTEXT_TOKENS=2048
 MAX_RESPONSE_TOKENS=512
 ```
 Restart after changes:
 ```bash
 sudo systemctl restart botserver
 ```
 ## Troubleshooting
 ### Out of Memory
 ```bash
 # Check memory usage
 free -h
 # Reduce llama.cpp context
 sudo nano /etc/systemd/system/llama-server.service
 # Change -c 2048 to -c 1024
 # Or use a smaller model
 # TinyLlama uses ~700MB, Phi-2 uses ~1.6GB
 ```
 ### Service Won't Start
 ```bash
 # Check logs
 sudo journalctl -u botserver -f
 sudo journalctl -u llama-server -f
 # Common issues:
 # - Port already in use
 # - Missing model file
 # - Database permissions
 ```
 ### Display Not Working
 ```bash
 # Check if display is detected
 ls /dev/fb*       # HDMI/DSI
 ls /dev/i2c*      # I2C displays
 ls /dev/spidev*   # SPI displays
 # For HDMI, check config
 sudo nano /boot/config.txt  # Raspberry Pi
 sudo nano /boot/armbianEnv.txt  # Orange Pi
 ```
 ## Next Steps
 - [Embedded UI Guide](./embedded-ui.md) - Customize the interface
 - [Local LLM Configuration](./local-llm.md) - Optimize AI performance
 - [Kiosk Mode](./kiosk-mode.md) - Production deployment
 - [Offline Operation](./offline.md) - Disconnected environments
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -390,5 +390,12 @@
 - [Appendix D: Documentation Style](./16-appendix-docs-style/conversation-examples.md)
  - [SVG and Conversation Standards](./16-appendix-docs-style/svg.md)
 # Part XV - Embedded & Offline
 - [Chapter 20: Embedded Deployment](./20-embedding/README.md)
  - [Supported Hardware](./20-embedding/hardware.md)
  - [Quick Start](./20-embedding/quick-start.md)
  - [Local LLM with llama.cpp](./20-embedding/local-llm.md)
 [Glossary](./glossary.md)
 [Contact](./contact/README.md)