diff --git a/src/13-devices/README.md b/src/13-devices/README.md new file mode 100644 index 00000000..4153b4ac --- /dev/null +++ b/src/13-devices/README.md @@ -0,0 +1,54 @@ +# Chapter 13: Device & Offline Deployment + +Deploy General Bots to any device - from smartphones to Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities. + +## Overview + +General Bots can run on any device, from mobile phones to minimal embedded hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere: + +- **Kiosks** - Self-service terminals in stores, airports, hospitals +- **Industrial IoT** - Factory floor assistants, machine interfaces +- **Smart Home** - Wall panels, kitchen displays, door intercoms +- **Retail** - Point-of-sale systems, product information terminals +- **Education** - Classroom assistants, lab equipment interfaces +- **Healthcare** - Patient check-in, medication reminders + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Embedded GB Architecture │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Display │ │ botserver │ │ llama.cpp │ │ +│ │ LCD/OLED │────▶│ (Rust) │────▶│ (Local) │ │ +│ │ TFT/HDMI │ │ Port 8088 │ │ Port 8080 │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ │ +│ │ Keyboard │ │ SQLite │ │ TinyLlama │ │ +│ │ Buttons │ │ (Data) │ │ GGUF │ │ +│ │ Touch │ │ │ │ (~700MB) │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## What's in This Chapter + +### Mobile Deployment +- [Mobile (Android & HarmonyOS)](./mobile.md) - BotOS for smartphones and tablets + +### Embedded Deployment +- [Supported Hardware](./hardware.md) - SBCs, displays, and peripherals +- [Quick Start](./quick-start.md) - Deploy in 5 minutes +- [Local LLM](./local-llm.md) - Offline AI with llama.cpp + +### Deployment Options + +| Platform | Use Case | Requirements | +|----------|----------|--------------| +| **Android/HarmonyOS** | Smartphones, tablets, kiosks | Any Android 8+ device | +| **Raspberry Pi** | IoT, displays, terminals | 1GB+ RAM | +| **Orange Pi** | Full offline AI | 4GB+ RAM for LLM | +| **Industrial** | Factory, retail, healthcare | Any ARM/x86 SBC | diff --git a/src/13-devices/hardware.md b/src/13-devices/hardware.md new file mode 100644 index 00000000..f8d6d416 --- /dev/null +++ b/src/13-devices/hardware.md @@ -0,0 +1,190 @@ +# Supported Hardware + +## Single Board Computers (SBCs) + +### Recommended Boards + +| Board | CPU | RAM | Best For | Price | +|-------|-----|-----|----------|-------| +| **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 | +| **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 | +| **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 | +| **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 | +| **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 | +| **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 | +| **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 | +| **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 | +| **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 | +| **ODROID-N2+** | S922X | 4GB | High performance | $79 | + +### Minimum Requirements + +**For UI only (connect to remote botserver):** +- Any ARM/x86 Linux board +- 256MB RAM +- Network connection +- Display output + +**For local botserver:** +- ARM64 or x86_64 +- 1GB RAM minimum +- 4GB storage + +**For local LLM (llama.cpp):** +- ARM64 or x86_64 +- 2GB+ RAM (4GB recommended) +- 2GB+ storage for model + +### Orange Pi 5 (Recommended for LLM) + +The Orange Pi 5 with RK3588S is ideal for embedded LLM: + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Orange Pi 5 - Best for Offline AI │ +├─────────────────────────────────────────────────────────────┤ +│ CPU: Rockchip RK3588S (4x A76 + 4x A55) │ +│ NPU: 6 TOPS (Neural Processing Unit) │ +│ GPU: Mali-G610 MP4 │ +│ RAM: 4GB / 8GB / 16GB LPDDR4X │ +│ Storage: M.2 NVMe + eMMC + microSD │ +│ │ +│ LLM Performance: │ +│ ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec │ +│ ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec │ +│ └─ With NPU (rkllm): ~20-30 tokens/sec │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Displays + +### Character LCDs (Minimal) + +For text-only interfaces: + +| Display | Resolution | Interface | Use Case | +|---------|------------|-----------|----------| +| HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A | +| HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context | +| LCD2004 | 20 chars × 4 lines | I2C | Industrial | + +**Example output on 16x2:** +``` +┌────────────────┐ +│> How can I help│ +│< Processing... │ +└────────────────┘ +``` + +### OLED Displays + +For graphical monochrome interfaces: + +| Display | Resolution | Interface | Size | +|---------|------------|-----------|------| +| SSD1306 | 128×64 | I2C/SPI | 0.96" | +| SSD1309 | 128×64 | I2C/SPI | 2.42" | +| SH1106 | 128×64 | I2C/SPI | 1.3" | +| SSD1322 | 256×64 | SPI | 3.12" | + +### TFT/IPS Color Displays + +For full graphical interface: + +| Display | Resolution | Interface | Notes | +|---------|------------|-----------|-------| +| ILI9341 | 320×240 | SPI | Common, cheap | +| ST7789 | 240×320 | SPI | Fast refresh | +| ILI9488 | 480×320 | SPI | Larger | +| Waveshare 5" | 800×480 | HDMI | Touch optional | +| Waveshare 7" | 1024×600 | HDMI | Touch, IPS | +| Official Pi 7" | 800×480 | DSI | Best for Pi | + +### E-Ink/E-Paper + +For low-power, readable in sunlight: + +| Display | Resolution | Colors | Refresh | +|---------|------------|--------|---------| +| Waveshare 2.13" | 250×122 | B/W | 2s | +| Waveshare 4.2" | 400×300 | B/W | 4s | +| Waveshare 7.5" | 800×480 | B/W | 5s | +| Good Display 9.7" | 1200×825 | B/W | 6s | + +**Best for:** Menu displays, signs, low-update applications + +### Industrial Displays + +| Display | Resolution | Features | +|---------|------------|----------| +| Advantech | Various | Wide temp, sunlight | +| Winstar | Various | Industrial grade | +| Newhaven | Various | Long availability | + +## Input Devices + +### Keyboards + +- **USB Keyboard** - Standard, any USB keyboard works +- **PS/2 Keyboard** - Via adapter, lower latency +- **Matrix Keypad** - 4x4 or 3x4, GPIO connected +- **I2C Keypad** - Fewer GPIO pins needed + +### Touch Input + +- **Capacitive Touch** - Better response, needs driver +- **Resistive Touch** - Works with gloves, pressure-based +- **IR Touch Frame** - Large displays, vandal-resistant + +### Buttons & GPIO + +``` +┌─────────────────────────────────────────────┐ +│ Simple 4-Button Interface │ +├─────────────────────────────────────────────┤ +│ │ +│ [◄ PREV] [▲ UP] [▼ DOWN] [► SELECT] │ +│ │ +│ GPIO 17 GPIO 27 GPIO 22 GPIO 23 │ +│ │ +└─────────────────────────────────────────────┘ +``` + +## Enclosures + +### Commercial Options + +- **Hammond Manufacturing** - Industrial metal enclosures +- **Polycase** - Plastic, IP65 rated +- **Bud Industries** - Various sizes +- **Pi-specific cases** - Argon, Flirc, etc. + +### DIY Options + +- **3D Printed** - Custom fit, PLA/PETG +- **Laser Cut** - Acrylic, wood +- **Metal Fabrication** - Professional look + +## Power + +### Power Requirements + +| Configuration | Power | Recommended PSU | +|---------------|-------|-----------------| +| Pi Zero + LCD | 1-2W | 5V 1A | +| Pi 4 + Display | 5-10W | 5V 3A | +| Orange Pi 5 | 8-15W | 5V 4A or 12V 2A | +| With NVMe SSD | +2-3W | Add 1A headroom | + +### Power Options + +- **USB-C PD** - Modern, efficient +- **PoE HAT** - Power over Ethernet +- **12V Barrel** - Industrial standard +- **Battery** - UPS, solar applications + +### UPS Solutions + +- **PiJuice** - Pi-specific UPS HAT +- **UPS PIco** - Small form factor +- **Powerboost** - Adafruit, lithium battery diff --git a/src/13-devices/local-llm.md b/src/13-devices/local-llm.md new file mode 100644 index 00000000..3feb4199 --- /dev/null +++ b/src/13-devices/local-llm.md @@ -0,0 +1,382 @@ +# Local LLM - Offline AI with llama.cpp + +Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy. + +## Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Local LLM Architecture │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ User Input ──▶ botserver ──▶ llama.cpp ──▶ Response │ +│ │ │ │ +│ │ ┌────┴────┐ │ +│ │ │ Model │ │ +│ │ │ GGUF │ │ +│ │ │ (Q4_K) │ │ +│ │ └─────────┘ │ +│ │ │ +│ SQLite DB │ +│ (sessions) │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Recommended Models + +### By Device RAM + +| RAM | Model | Size | Speed | Quality | +|-----|-------|------|-------|---------| +| **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic | +| **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good | +| **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good | +| **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better | +| **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great | +| **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent | + +### By Use Case + +**Simple Q&A, Commands:** +``` +TinyLlama 1.1B - Fast, basic understanding +``` + +**Customer Service, FAQ:** +``` +Phi-2 or Gemma 2B - Good comprehension, reasonable speed +``` + +**Complex Reasoning:** +``` +Llama 3.2 3B or Mistral 7B - Better accuracy, slower +``` + +## Installation + +### Automatic (via deploy script) + +```bash +./scripts/deploy-embedded.sh pi@device --with-llama +``` + +### Manual Installation + +```bash +# SSH to device +ssh pi@raspberrypi.local + +# Install dependencies +sudo apt update +sudo apt install -y build-essential cmake git wget + +# Clone llama.cpp +cd /opt +sudo git clone https://github.com/ggerganov/llama.cpp +sudo chown -R $(whoami):$(whoami) llama.cpp +cd llama.cpp + +# Build for ARM (auto-optimizes) +mkdir build && cd build +cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release +make -j$(nproc) + +# Download model +mkdir -p /opt/llama.cpp/models +cd /opt/llama.cpp/models +wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf +``` + +### Start Server + +```bash +# Test run +/opt/llama.cpp/build/bin/llama-server \ + -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ + --host 0.0.0.0 \ + --port 8080 \ + -c 2048 \ + --threads 4 + +# Verify +curl http://localhost:8080/v1/models +``` + +### Systemd Service + +Create `/etc/systemd/system/llama-server.service`: + +```ini +[Unit] +Description=llama.cpp Server - Local LLM +After=network.target + +[Service] +Type=simple +User=root +WorkingDirectory=/opt/llama.cpp +ExecStart=/opt/llama.cpp/build/bin/llama-server \ + -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ + --host 0.0.0.0 \ + --port 8080 \ + -c 2048 \ + -ngl 0 \ + --threads 4 +Restart=always +RestartSec=5 + +[Install] +WantedBy=multi-user.target +``` + +Enable and start: +```bash +sudo systemctl daemon-reload +sudo systemctl enable llama-server +sudo systemctl start llama-server +``` + +## Configuration + +### botserver .env + +```env +# Use local llama.cpp +LLM_PROVIDER=llamacpp +LLM_API_URL=http://127.0.0.1:8080 +LLM_MODEL=tinyllama + +# Memory limits +MAX_CONTEXT_TOKENS=2048 +MAX_RESPONSE_TOKENS=512 +STREAMING_ENABLED=true +``` + +### llama.cpp Parameters + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `-c` | 2048 | Context size (tokens) | +| `--threads` | 4 | CPU threads | +| `-ngl` | 0 | GPU layers (0 for CPU only) | +| `--host` | 127.0.0.1 | Bind address | +| `--port` | 8080 | Server port | +| `-b` | 512 | Batch size | +| `--mlock` | off | Lock model in RAM | + +### Memory vs Context Size + +``` +Context 512: ~400MB RAM, fast, limited conversation +Context 1024: ~600MB RAM, moderate +Context 2048: ~900MB RAM, good for most uses +Context 4096: ~1.5GB RAM, long conversations +``` + +## Performance Optimization + +### CPU Optimization + +```bash +# Check CPU features +cat /proc/cpuinfo | grep -E "(model name|Features)" + +# Build with specific optimizations +cmake .. -DLLAMA_NATIVE=ON \ + -DCMAKE_BUILD_TYPE=Release \ + -DLLAMA_ARM_FMA=ON \ + -DLLAMA_ARM_DOTPROD=ON +``` + +### Memory Optimization + +```bash +# For 2GB RAM devices +# Use smaller context +-c 1024 + +# Use memory mapping (slower but less RAM) +--mmap + +# Disable mlock (don't pin to RAM) +# (default is disabled) +``` + +### Swap Configuration + +For devices with limited RAM: + +```bash +# Create 2GB swap +sudo fallocate -l 2G /swapfile +sudo chmod 600 /swapfile +sudo mkswap /swapfile +sudo swapon /swapfile + +# Make permanent +echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab + +# Optimize swap usage +echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf +``` + +## NPU Acceleration (Orange Pi 5) + +Orange Pi 5 has a 6 TOPS NPU that can accelerate inference: + +### Using rkllm (Rockchip NPU) + +```bash +# Install rkllm runtime +git clone https://github.com/airockchip/rknn-llm +cd rknn-llm +./install.sh + +# Convert model to RKNN format +python3 convert_model.py \ + --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ + --output tinyllama.rkllm + +# Run with NPU +rkllm-server \ + --model tinyllama.rkllm \ + --port 8080 +``` + +Expected speedup: **3-5x faster** than CPU only. + +## Model Download URLs + +### TinyLlama 1.1B (Recommended for 2GB) +```bash +wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf +``` + +### Phi-2 2.7B (Recommended for 4GB) +```bash +wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf +``` + +### Gemma 2B +```bash +wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf +``` + +### Llama 3.2 3B (Recommended for 8GB) +```bash +wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf +``` + +### Mistral 7B +```bash +wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf +``` + +## API Usage + +llama.cpp exposes an OpenAI-compatible API: + +### Chat Completion + +```bash +curl http://localhost:8080/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "tinyllama", + "messages": [ + {"role": "user", "content": "What is 2+2?"} + ], + "max_tokens": 100 + }' +``` + +### Streaming + +```bash +curl http://localhost:8080/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "tinyllama", + "messages": [{"role": "user", "content": "Tell me a story"}], + "stream": true + }' +``` + +### Health Check + +```bash +curl http://localhost:8080/health +curl http://localhost:8080/v1/models +``` + +## Monitoring + +### Check Performance + +```bash +# Watch resource usage +htop + +# Check inference speed in logs +sudo journalctl -u llama-server -f | grep "tokens/s" + +# Memory usage +free -h +``` + +### Benchmarking + +```bash +# Run llama.cpp benchmark +/opt/llama.cpp/build/bin/llama-bench \ + -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ + -p 512 -n 128 -t 4 +``` + +## Troubleshooting + +### Model Loading Fails + +```bash +# Check available RAM +free -h + +# Try smaller context +-c 512 + +# Use memory mapping +--mmap +``` + +### Slow Inference + +```bash +# Increase threads (up to CPU cores) +--threads $(nproc) + +# Use optimized build +cmake .. -DLLAMA_NATIVE=ON + +# Consider smaller model +``` + +### Out of Memory Killer + +```bash +# Check if OOM killed the process +dmesg | grep -i "killed process" + +# Increase swap +# Use smaller model +# Reduce context size +``` + +## Best Practices + +1. **Start small** - Begin with TinyLlama, upgrade if needed +2. **Monitor memory** - Use `htop` during initial tests +3. **Set appropriate context** - 1024-2048 for most embedded use +4. **Use quantized models** - Q4_K_M is a good balance +5. **Enable streaming** - Better UX on slow inference +6. **Test offline** - Verify it works without internet before deployment diff --git a/src/13-devices/mobile.md b/src/13-devices/mobile.md new file mode 100644 index 00000000..6cc3c1ca --- /dev/null +++ b/src/13-devices/mobile.md @@ -0,0 +1,323 @@ +# Mobile Deployment - Android & HarmonyOS + +Deploy General Bots as the primary interface on Android and HarmonyOS devices, transforming them into dedicated AI assistants. + +## Overview + +BotOS transforms any Android or HarmonyOS device into a dedicated General Bots system, removing manufacturer bloatware and installing GB as the default launcher. + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ BotOS Architecture │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ BotOS App (Tauri) │ │ +│ ├──────────────────────────────────────────────────────────────────┤ │ +│ │ botui/ui/suite │ Tauri Android │ src/lib.rs (Rust) │ │ +│ │ (Web Interface) │ (WebView + NDK) │ (Backend + Hardware) │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────┴────────────────────────────┐ │ +│ │ Android/HarmonyOS System │ │ +│ │ ┌─────────┐ ┌──────────┐ ┌────────┐ ┌─────────┐ │ │ +│ │ │ Camera │ │ GPS │ │ WiFi │ │ Storage │ │ │ +│ │ └─────────┘ └──────────┘ └────────┘ └─────────┘ │ │ +│ └───────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Supported Platforms + +### Android +- **AOSP** - Pure Android +- **Samsung One UI** - Galaxy devices +- **Xiaomi MIUI** - Mi, Redmi, Poco +- **OPPO ColorOS** - OPPO, OnePlus, Realme +- **Vivo Funtouch/OriginOS** +- **Google Pixel** + +### HarmonyOS +- **Huawei** - P series, Mate series, Nova +- **Honor** - Magic series, X series + +## Installation Levels + +| Level | Requirements | What It Does | +|-------|-------------|--------------| +| **Level 1** | ADB only | Removes bloatware, installs BotOS as app | +| **Level 2** | Root + Magisk | GB boot animation, BotOS as system app | +| **Level 3** | Unlocked bootloader | Full Android replacement with BotOS | + +## Quick Installation + +### Level 1: Debloat + App (No Root) + +```bash +# Clone botos repository +git clone https://github.com/GeneralBots/botos.git +cd botos/rom + +# Connect device via USB (enable USB debugging first) +./install.sh +``` + +The interactive installer will: +1. Detect your device and manufacturer +2. Remove bloatware automatically +3. Install BotOS APK +4. Optionally set as default launcher + +### Level 2: Magisk Module (Root Required) + +```bash +# Generate Magisk module +cd botos/rom/scripts +./build-magisk-module.sh + +# Copy to device +adb push botos-magisk-v1.0.zip /sdcard/ + +# Install via Magisk app +# Magisk → Modules → + → Select ZIP → Reboot +``` + +This adds: +- Custom boot animation +- BotOS as system app (privileged permissions) +- Debloat via overlay + +### Level 3: GSI (Full Replacement) + +For advanced users with unlocked bootloader. See `botos/rom/gsi/README.md`. + +## Bloatware Removed + +### Samsung One UI +- Bixby, Samsung Pay, Samsung Pass +- Duplicate apps (Email, Calendar, Browser) +- AR Zone, Game Launcher +- Samsung Free, Samsung Global Goals + +### Huawei EMUI/HarmonyOS +- AppGallery, HiCloud, HiCar +- Huawei Browser, Music, Video +- Petal Maps, Petal Search +- AI Life, HiSuite + +### Honor MagicOS +- Honor Store, MagicRing +- Honor Browser, Music + +### Xiaomi MIUI +- MSA (analytics), Mi Apps +- GetApps, Mi Cloud +- Mi Browser, Mi Music + +### Universal (All Devices) +- Pre-installed Facebook, Instagram +- Pre-installed Netflix, Spotify +- Games like Candy Crush +- Carrier bloatware + +## Building from Source + +### Prerequisites + +```bash +# Install Rust and Android targets +rustup target add aarch64-linux-android armv7-linux-androideabi + +# Set up Android SDK/NDK +export ANDROID_HOME=$HOME/Android/Sdk +export NDK_HOME=$ANDROID_HOME/ndk/25.2.9519653 + +# Install Tauri CLI +cargo install tauri-cli + +# For icons/boot animation +sudo apt install librsvg2-bin imagemagick +``` + +### Build APK + +```bash +cd botos + +# Generate icons from SVG +./scripts/generate-icons.sh + +# Initialize Android project +cargo tauri android init + +# Build release APK +cargo tauri android build --release +``` + +Output: `gen/android/app/build/outputs/apk/release/app-release.apk` + +### Development Mode + +```bash +# Connect device and run +cargo tauri android dev + +# Watch logs +adb logcat -s BotOS:* +``` + +## Configuration + +### AndroidManifest.xml + +BotOS is configured as a launcher: + +```xml + + + + + + +``` + +### Permissions + +Default capabilities in `capabilities/default.json`: +- Internet access +- Camera (for QR codes, photos) +- Location (GPS) +- Storage (files) +- Notifications + +### Connecting to Server + +Edit the embedded URL in `tauri.conf.json`: + +```json +{ + "build": { + "frontendDist": "../botui/ui/suite" + } +} +``` + +Or configure botserver URL at runtime: +```javascript +window.BOTSERVER_URL = "https://your-server.com"; +``` + +## Boot Animation + +Create custom boot animation with GB branding: + +```bash +# Generate animation +cd botos/scripts +./create-bootanimation.sh + +# Install (requires root) +adb root +adb remount +adb push bootanimation.zip /system/media/ +adb reboot +``` + +## Project Structure + +``` +botos/ +├── Cargo.toml # Rust/Tauri dependencies +├── tauri.conf.json # Tauri config → botui/ui/suite +├── build.rs # Build script +├── src/lib.rs # Android entry point +│ +├── icons/ +│ ├── gb-bot.svg # Source icon +│ ├── icon.png # Main icon (512x512) +│ └── */ic_launcher.png # Icons by density +│ +├── scripts/ +│ ├── generate-icons.sh # Generate PNGs from SVG +│ └── create-bootanimation.sh +│ +├── capabilities/ +│ └── default.json # Tauri permissions +│ +├── gen/android/ # Generated Android project +│ └── app/src/main/ +│ ├── AndroidManifest.xml +│ └── res/values/themes.xml +│ +└── rom/ # Installation tools + ├── install.sh # Interactive installer + ├── scripts/ + │ ├── debloat.sh # Remove bloatware + │ └── build-magisk-module.sh + └── gsi/ + └── README.md # GSI instructions +``` + +## Offline Mode + +BotOS can work offline with local LLM: + +1. Install botserver on the device (see [Local LLM](./local-llm.md)) +2. Configure to use localhost: + ```javascript + window.BOTSERVER_URL = "http://127.0.0.1:8088"; + ``` +3. Run llama.cpp with small model (TinyLlama on 4GB+ devices) + +## Use Cases + +### Dedicated Kiosk +- Retail product information +- Hotel check-in +- Restaurant ordering +- Museum guides + +### Enterprise Device +- Field service assistant +- Warehouse scanner with AI +- Delivery driver companion +- Healthcare bedside terminal + +### Consumer Device +- Elder-friendly phone +- Child-safe device +- Single-purpose assistant +- Smart home controller + +## Troubleshooting + +### App Won't Install +```bash +# Enable installation from unknown sources +# Settings → Security → Unknown Sources + +# Or use ADB +adb install -r botos.apk +``` + +### Debloat Not Working +```bash +# Some packages require root +# Use Level 2 (Magisk) for complete removal + +# Check which packages failed +adb shell pm list packages | grep +``` + +### Boot Loop After GSI +```bash +# Boot into recovery +# Wipe data/factory reset +# Reflash stock ROM +``` + +### WebView Crashes +```bash +# Update Android System WebView +adb shell pm enable com.google.android.webview diff --git a/src/13-devices/quick-start.md b/src/13-devices/quick-start.md new file mode 100644 index 00000000..b7a15fb2 --- /dev/null +++ b/src/13-devices/quick-start.md @@ -0,0 +1,209 @@ +# Quick Start - Deploy in 5 Minutes + +Get General Bots running on your embedded device with local AI in just a few commands. + +## Prerequisites + +- An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian +- SSH access to the device +- Internet connection (for initial setup only) + +## One-Line Deploy + +From your development machine: + +```bash +# Clone and run the deployment script +git clone https://github.com/GeneralBots/botserver.git +cd botserver + +# Deploy to Orange Pi (replace with your device IP) +./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama +``` + +That's it! After ~10-15 minutes: +- BotServer runs on port 8088 +- llama.cpp runs on port 8080 with TinyLlama +- Embedded UI available at `http://your-device:8088/embedded/` + +## Step-by-Step Guide + +### Step 1: Prepare Your Device + +Flash your SBC with a compatible OS: + +**Raspberry Pi:** +```bash +# Download Raspberry Pi Imager +# Select: Raspberry Pi OS Lite (64-bit) +# Enable SSH in settings +``` + +**Orange Pi:** +```bash +# Download Armbian from armbian.com +# Flash with balenaEtcher +``` + +### Step 2: First Boot Configuration + +```bash +# SSH into your device +ssh pi@raspberrypi.local # or orangepi@orangepi.local + +# Update system +sudo apt update && sudo apt upgrade -y + +# Set timezone +sudo timedatectl set-timezone America/Sao_Paulo + +# Enable I2C/SPI if using GPIO displays +sudo raspi-config # or armbian-config +``` + +### Step 3: Run Deployment Script + +From your development PC: + +```bash +# Basic deployment (botserver only) +./scripts/deploy-embedded.sh pi@raspberrypi.local + +# With embedded UI +./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui + +# With local LLM (requires 4GB+ RAM) +./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama + +# Specify a different model +./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf +``` + +### Step 4: Verify Installation + +```bash +# Check services +ssh pi@raspberrypi.local 'sudo systemctl status botserver' +ssh pi@raspberrypi.local 'sudo systemctl status llama-server' + +# Test botserver +curl http://raspberrypi.local:8088/health + +# Test llama.cpp +curl http://raspberrypi.local:8080/v1/models +``` + +### Step 5: Access the Interface + +Open in your browser: +``` +http://raspberrypi.local:8088/embedded/ +``` + +Or set up kiosk mode (auto-starts on boot): +```bash +# Already configured if you used --with-ui +# Just reboot: +ssh pi@raspberrypi.local 'sudo reboot' +``` + +## Local Installation (On the Device) + +If you prefer to install directly on the device: + +```bash +# SSH into the device +ssh pi@raspberrypi.local + +# Install Rust +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +source ~/.cargo/env + +# Clone and build +git clone https://github.com/GeneralBots/botserver.git +cd botserver + +# Run local deployment +./scripts/deploy-embedded.sh --local --with-ui --with-llama +``` + +⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster. + +## Configuration + +After deployment, edit the config file: + +```bash +ssh pi@raspberrypi.local +sudo nano /opt/botserver/.env +``` + +Key settings: +```env +# Server +HOST=0.0.0.0 +PORT=8088 + +# Local LLM +LLM_PROVIDER=llamacpp +LLM_API_URL=http://127.0.0.1:8080 +LLM_MODEL=tinyllama + +# Memory limits for small devices +MAX_CONTEXT_TOKENS=2048 +MAX_RESPONSE_TOKENS=512 +``` + +Restart after changes: +```bash +sudo systemctl restart botserver +``` + +## Troubleshooting + +### Out of Memory + +```bash +# Check memory usage +free -h + +# Reduce llama.cpp context +sudo nano /etc/systemd/system/llama-server.service +# Change -c 2048 to -c 1024 + +# Or use a smaller model +# TinyLlama uses ~700MB, Phi-2 uses ~1.6GB +``` + +### Service Won't Start + +```bash +# Check logs +sudo journalctl -u botserver -f +sudo journalctl -u llama-server -f + +# Common issues: +# - Port already in use +# - Missing model file +# - Database permissions +``` + +### Display Not Working + +```bash +# Check if display is detected +ls /dev/fb* # HDMI/DSI +ls /dev/i2c* # I2C displays +ls /dev/spidev* # SPI displays + +# For HDMI, check config +sudo nano /boot/config.txt # Raspberry Pi +sudo nano /boot/armbianEnv.txt # Orange Pi +``` + +## Next Steps + +- [Embedded UI Guide](./embedded-ui.md) - Customize the interface +- [Local LLM Configuration](./local-llm.md) - Optimize AI performance +- [Kiosk Mode](./kiosk-mode.md) - Production deployment +- [Offline Operation](./offline.md) - Disconnected environments diff --git a/src/20-embedding/README.md b/src/20-embedding/README.md index 69fce7df..21ffe7e0 100644 --- a/src/20-embedding/README.md +++ b/src/20-embedding/README.md @@ -1,47 +1 @@ -# Chapter 20: Embedded & Offline Deployment - -Deploy General Bots to any device - from Raspberry Pi to industrial kiosks - with local LLM inference for fully offline AI capabilities. - -## Overview - -General Bots can run on minimal hardware with displays as small as 16x2 character LCDs, enabling AI-powered interactions anywhere: - -- **Kiosks** - Self-service terminals in stores, airports, hospitals -- **Industrial IoT** - Factory floor assistants, machine interfaces -- **Smart Home** - Wall panels, kitchen displays, door intercoms -- **Retail** - Point-of-sale systems, product information terminals -- **Education** - Classroom assistants, lab equipment interfaces -- **Healthcare** - Patient check-in, medication reminders - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Embedded GB Architecture │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ Display │ │ botserver │ │ llama.cpp │ │ -│ │ LCD/OLED │────▶│ (Rust) │────▶│ (Local) │ │ -│ │ TFT/HDMI │ │ Port 8088 │ │ Port 8080 │ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -│ │ │ │ │ -│ │ │ │ │ -│ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ │ -│ │ Keyboard │ │ SQLite │ │ TinyLlama │ │ -│ │ Buttons │ │ (Data) │ │ GGUF │ │ -│ │ Touch │ │ │ │ (~700MB) │ │ -│ └─────────────┘ └─────────────┘ └─────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -## What's in This Chapter - -- [Supported Hardware](./hardware.md) - Boards, displays, and peripherals -- [Quick Start](./quick-start.md) - Deploy in 5 minutes -- [Embedded UI](./embedded-ui.md) - Interface for small displays -- [Local LLM](./local-llm.md) - Offline AI with llama.cpp -- [Display Modes](./display-modes.md) - LCD, OLED, TFT, E-ink configurations -- [Kiosk Mode](./kiosk-mode.md) - Locked-down production deployments -- [Performance Tuning](./performance.md) - Optimize for limited resources -- [Offline Operation](./offline.md) - No internet required -- [Use Cases](./use-cases.md) - Real-world deployment examples +# Chapter 20: Embedded Deployment diff --git a/src/20-embedding/hardware.md b/src/20-embedding/hardware.md index f8d6d416..1bbe111c 100644 --- a/src/20-embedding/hardware.md +++ b/src/20-embedding/hardware.md @@ -1,190 +1 @@ # Supported Hardware - -## Single Board Computers (SBCs) - -### Recommended Boards - -| Board | CPU | RAM | Best For | Price | -|-------|-----|-----|----------|-------| -| **Orange Pi 5** | RK3588S | 4-16GB | Full LLM, NPU accel | $89-149 | -| **Raspberry Pi 5** | BCM2712 | 4-8GB | General purpose | $60-80 | -| **Orange Pi Zero 3** | H618 | 1-4GB | Minimal deployments | $20-35 | -| **Raspberry Pi 4** | BCM2711 | 2-8GB | Established ecosystem | $45-75 | -| **Raspberry Pi Zero 2W** | RP3A0 | 512MB | Ultra-compact | $15 | -| **Rock Pi 4** | RK3399 | 4GB | NPU available | $75 | -| **NVIDIA Jetson Nano** | Tegra X1 | 4GB | GPU inference | $149 | -| **BeagleBone Black** | AM3358 | 512MB | Industrial | $55 | -| **LattePanda 3 Delta** | N100 | 8GB | x86 compatibility | $269 | -| **ODROID-N2+** | S922X | 4GB | High performance | $79 | - -### Minimum Requirements - -**For UI only (connect to remote botserver):** -- Any ARM/x86 Linux board -- 256MB RAM -- Network connection -- Display output - -**For local botserver:** -- ARM64 or x86_64 -- 1GB RAM minimum -- 4GB storage - -**For local LLM (llama.cpp):** -- ARM64 or x86_64 -- 2GB+ RAM (4GB recommended) -- 2GB+ storage for model - -### Orange Pi 5 (Recommended for LLM) - -The Orange Pi 5 with RK3588S is ideal for embedded LLM: - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Orange Pi 5 - Best for Offline AI │ -├─────────────────────────────────────────────────────────────┤ -│ CPU: Rockchip RK3588S (4x A76 + 4x A55) │ -│ NPU: 6 TOPS (Neural Processing Unit) │ -│ GPU: Mali-G610 MP4 │ -│ RAM: 4GB / 8GB / 16GB LPDDR4X │ -│ Storage: M.2 NVMe + eMMC + microSD │ -│ │ -│ LLM Performance: │ -│ ├─ TinyLlama 1.1B Q4: ~8-12 tokens/sec │ -│ ├─ Phi-2 2.7B Q4: ~4-6 tokens/sec │ -│ └─ With NPU (rkllm): ~20-30 tokens/sec │ -└─────────────────────────────────────────────────────────────┘ -``` - -## Displays - -### Character LCDs (Minimal) - -For text-only interfaces: - -| Display | Resolution | Interface | Use Case | -|---------|------------|-----------|----------| -| HD44780 16x2 | 16 chars × 2 lines | I2C/GPIO | Status, simple Q&A | -| HD44780 20x4 | 20 chars × 4 lines | I2C/GPIO | More context | -| LCD2004 | 20 chars × 4 lines | I2C | Industrial | - -**Example output on 16x2:** -``` -┌────────────────┐ -│> How can I help│ -│< Processing... │ -└────────────────┘ -``` - -### OLED Displays - -For graphical monochrome interfaces: - -| Display | Resolution | Interface | Size | -|---------|------------|-----------|------| -| SSD1306 | 128×64 | I2C/SPI | 0.96" | -| SSD1309 | 128×64 | I2C/SPI | 2.42" | -| SH1106 | 128×64 | I2C/SPI | 1.3" | -| SSD1322 | 256×64 | SPI | 3.12" | - -### TFT/IPS Color Displays - -For full graphical interface: - -| Display | Resolution | Interface | Notes | -|---------|------------|-----------|-------| -| ILI9341 | 320×240 | SPI | Common, cheap | -| ST7789 | 240×320 | SPI | Fast refresh | -| ILI9488 | 480×320 | SPI | Larger | -| Waveshare 5" | 800×480 | HDMI | Touch optional | -| Waveshare 7" | 1024×600 | HDMI | Touch, IPS | -| Official Pi 7" | 800×480 | DSI | Best for Pi | - -### E-Ink/E-Paper - -For low-power, readable in sunlight: - -| Display | Resolution | Colors | Refresh | -|---------|------------|--------|---------| -| Waveshare 2.13" | 250×122 | B/W | 2s | -| Waveshare 4.2" | 400×300 | B/W | 4s | -| Waveshare 7.5" | 800×480 | B/W | 5s | -| Good Display 9.7" | 1200×825 | B/W | 6s | - -**Best for:** Menu displays, signs, low-update applications - -### Industrial Displays - -| Display | Resolution | Features | -|---------|------------|----------| -| Advantech | Various | Wide temp, sunlight | -| Winstar | Various | Industrial grade | -| Newhaven | Various | Long availability | - -## Input Devices - -### Keyboards - -- **USB Keyboard** - Standard, any USB keyboard works -- **PS/2 Keyboard** - Via adapter, lower latency -- **Matrix Keypad** - 4x4 or 3x4, GPIO connected -- **I2C Keypad** - Fewer GPIO pins needed - -### Touch Input - -- **Capacitive Touch** - Better response, needs driver -- **Resistive Touch** - Works with gloves, pressure-based -- **IR Touch Frame** - Large displays, vandal-resistant - -### Buttons & GPIO - -``` -┌─────────────────────────────────────────────┐ -│ Simple 4-Button Interface │ -├─────────────────────────────────────────────┤ -│ │ -│ [◄ PREV] [▲ UP] [▼ DOWN] [► SELECT] │ -│ │ -│ GPIO 17 GPIO 27 GPIO 22 GPIO 23 │ -│ │ -└─────────────────────────────────────────────┘ -``` - -## Enclosures - -### Commercial Options - -- **Hammond Manufacturing** - Industrial metal enclosures -- **Polycase** - Plastic, IP65 rated -- **Bud Industries** - Various sizes -- **Pi-specific cases** - Argon, Flirc, etc. - -### DIY Options - -- **3D Printed** - Custom fit, PLA/PETG -- **Laser Cut** - Acrylic, wood -- **Metal Fabrication** - Professional look - -## Power - -### Power Requirements - -| Configuration | Power | Recommended PSU | -|---------------|-------|-----------------| -| Pi Zero + LCD | 1-2W | 5V 1A | -| Pi 4 + Display | 5-10W | 5V 3A | -| Orange Pi 5 | 8-15W | 5V 4A or 12V 2A | -| With NVMe SSD | +2-3W | Add 1A headroom | - -### Power Options - -- **USB-C PD** - Modern, efficient -- **PoE HAT** - Power over Ethernet -- **12V Barrel** - Industrial standard -- **Battery** - UPS, solar applications - -### UPS Solutions - -- **PiJuice** - Pi-specific UPS HAT -- **UPS PIco** - Small form factor -- **Powerboost** - Adafruit, lithium battery diff --git a/src/20-embedding/local-llm.md b/src/20-embedding/local-llm.md index 3feb4199..86b1bc90 100644 --- a/src/20-embedding/local-llm.md +++ b/src/20-embedding/local-llm.md @@ -1,382 +1 @@ -# Local LLM - Offline AI with llama.cpp - -Run AI inference completely offline on embedded devices. No internet, no API costs, full privacy. - -## Overview - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Local LLM Architecture │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ User Input ──▶ botserver ──▶ llama.cpp ──▶ Response │ -│ │ │ │ -│ │ ┌────┴────┐ │ -│ │ │ Model │ │ -│ │ │ GGUF │ │ -│ │ │ (Q4_K) │ │ -│ │ └─────────┘ │ -│ │ │ -│ SQLite DB │ -│ (sessions) │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -## Recommended Models - -### By Device RAM - -| RAM | Model | Size | Speed | Quality | -|-----|-------|------|-------|---------| -| **2GB** | TinyLlama 1.1B Q4_K_M | 670MB | ~5 tok/s | Basic | -| **4GB** | Phi-2 2.7B Q4_K_M | 1.6GB | ~3-4 tok/s | Good | -| **4GB** | Gemma 2B Q4_K_M | 1.4GB | ~4 tok/s | Good | -| **8GB** | Llama 3.2 3B Q4_K_M | 2GB | ~3 tok/s | Better | -| **8GB** | Mistral 7B Q4_K_M | 4.1GB | ~2 tok/s | Great | -| **16GB** | Llama 3.1 8B Q4_K_M | 4.7GB | ~2 tok/s | Excellent | - -### By Use Case - -**Simple Q&A, Commands:** -``` -TinyLlama 1.1B - Fast, basic understanding -``` - -**Customer Service, FAQ:** -``` -Phi-2 or Gemma 2B - Good comprehension, reasonable speed -``` - -**Complex Reasoning:** -``` -Llama 3.2 3B or Mistral 7B - Better accuracy, slower -``` - -## Installation - -### Automatic (via deploy script) - -```bash -./scripts/deploy-embedded.sh pi@device --with-llama -``` - -### Manual Installation - -```bash -# SSH to device -ssh pi@raspberrypi.local - -# Install dependencies -sudo apt update -sudo apt install -y build-essential cmake git wget - -# Clone llama.cpp -cd /opt -sudo git clone https://github.com/ggerganov/llama.cpp -sudo chown -R $(whoami):$(whoami) llama.cpp -cd llama.cpp - -# Build for ARM (auto-optimizes) -mkdir build && cd build -cmake .. -DLLAMA_NATIVE=ON -DCMAKE_BUILD_TYPE=Release -make -j$(nproc) - -# Download model -mkdir -p /opt/llama.cpp/models -cd /opt/llama.cpp/models -wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -``` - -### Start Server - -```bash -# Test run -/opt/llama.cpp/build/bin/llama-server \ - -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ - --host 0.0.0.0 \ - --port 8080 \ - -c 2048 \ - --threads 4 - -# Verify -curl http://localhost:8080/v1/models -``` - -### Systemd Service - -Create `/etc/systemd/system/llama-server.service`: - -```ini -[Unit] -Description=llama.cpp Server - Local LLM -After=network.target - -[Service] -Type=simple -User=root -WorkingDirectory=/opt/llama.cpp -ExecStart=/opt/llama.cpp/build/bin/llama-server \ - -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ - --host 0.0.0.0 \ - --port 8080 \ - -c 2048 \ - -ngl 0 \ - --threads 4 -Restart=always -RestartSec=5 - -[Install] -WantedBy=multi-user.target -``` - -Enable and start: -```bash -sudo systemctl daemon-reload -sudo systemctl enable llama-server -sudo systemctl start llama-server -``` - -## Configuration - -### botserver .env - -```env -# Use local llama.cpp -LLM_PROVIDER=llamacpp -LLM_API_URL=http://127.0.0.1:8080 -LLM_MODEL=tinyllama - -# Memory limits -MAX_CONTEXT_TOKENS=2048 -MAX_RESPONSE_TOKENS=512 -STREAMING_ENABLED=true -``` - -### llama.cpp Parameters - -| Parameter | Default | Description | -|-----------|---------|-------------| -| `-c` | 2048 | Context size (tokens) | -| `--threads` | 4 | CPU threads | -| `-ngl` | 0 | GPU layers (0 for CPU only) | -| `--host` | 127.0.0.1 | Bind address | -| `--port` | 8080 | Server port | -| `-b` | 512 | Batch size | -| `--mlock` | off | Lock model in RAM | - -### Memory vs Context Size - -``` -Context 512: ~400MB RAM, fast, limited conversation -Context 1024: ~600MB RAM, moderate -Context 2048: ~900MB RAM, good for most uses -Context 4096: ~1.5GB RAM, long conversations -``` - -## Performance Optimization - -### CPU Optimization - -```bash -# Check CPU features -cat /proc/cpuinfo | grep -E "(model name|Features)" - -# Build with specific optimizations -cmake .. -DLLAMA_NATIVE=ON \ - -DCMAKE_BUILD_TYPE=Release \ - -DLLAMA_ARM_FMA=ON \ - -DLLAMA_ARM_DOTPROD=ON -``` - -### Memory Optimization - -```bash -# For 2GB RAM devices -# Use smaller context --c 1024 - -# Use memory mapping (slower but less RAM) ---mmap - -# Disable mlock (don't pin to RAM) -# (default is disabled) -``` - -### Swap Configuration - -For devices with limited RAM: - -```bash -# Create 2GB swap -sudo fallocate -l 2G /swapfile -sudo chmod 600 /swapfile -sudo mkswap /swapfile -sudo swapon /swapfile - -# Make permanent -echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab - -# Optimize swap usage -echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf -``` - -## NPU Acceleration (Orange Pi 5) - -Orange Pi 5 has a 6 TOPS NPU that can accelerate inference: - -### Using rkllm (Rockchip NPU) - -```bash -# Install rkllm runtime -git clone https://github.com/airockchip/rknn-llm -cd rknn-llm -./install.sh - -# Convert model to RKNN format -python3 convert_model.py \ - --model tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ - --output tinyllama.rkllm - -# Run with NPU -rkllm-server \ - --model tinyllama.rkllm \ - --port 8080 -``` - -Expected speedup: **3-5x faster** than CPU only. - -## Model Download URLs - -### TinyLlama 1.1B (Recommended for 2GB) -```bash -wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -``` - -### Phi-2 2.7B (Recommended for 4GB) -```bash -wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf -``` - -### Gemma 2B -```bash -wget https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf -``` - -### Llama 3.2 3B (Recommended for 8GB) -```bash -wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf -``` - -### Mistral 7B -```bash -wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf -``` - -## API Usage - -llama.cpp exposes an OpenAI-compatible API: - -### Chat Completion - -```bash -curl http://localhost:8080/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "tinyllama", - "messages": [ - {"role": "user", "content": "What is 2+2?"} - ], - "max_tokens": 100 - }' -``` - -### Streaming - -```bash -curl http://localhost:8080/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "tinyllama", - "messages": [{"role": "user", "content": "Tell me a story"}], - "stream": true - }' -``` - -### Health Check - -```bash -curl http://localhost:8080/health -curl http://localhost:8080/v1/models -``` - -## Monitoring - -### Check Performance - -```bash -# Watch resource usage -htop - -# Check inference speed in logs -sudo journalctl -u llama-server -f | grep "tokens/s" - -# Memory usage -free -h -``` - -### Benchmarking - -```bash -# Run llama.cpp benchmark -/opt/llama.cpp/build/bin/llama-bench \ - -m /opt/llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \ - -p 512 -n 128 -t 4 -``` - -## Troubleshooting - -### Model Loading Fails - -```bash -# Check available RAM -free -h - -# Try smaller context --c 512 - -# Use memory mapping ---mmap -``` - -### Slow Inference - -```bash -# Increase threads (up to CPU cores) ---threads $(nproc) - -# Use optimized build -cmake .. -DLLAMA_NATIVE=ON - -# Consider smaller model -``` - -### Out of Memory Killer - -```bash -# Check if OOM killed the process -dmesg | grep -i "killed process" - -# Increase swap -# Use smaller model -# Reduce context size -``` - -## Best Practices - -1. **Start small** - Begin with TinyLlama, upgrade if needed -2. **Monitor memory** - Use `htop` during initial tests -3. **Set appropriate context** - 1024-2048 for most embedded use -4. **Use quantized models** - Q4_K_M is a good balance -5. **Enable streaming** - Better UX on slow inference -6. **Test offline** - Verify it works without internet before deployment +# Local LLM with llama.cpp diff --git a/src/20-embedding/quick-start.md b/src/20-embedding/quick-start.md index b7a15fb2..05cf8c1f 100644 --- a/src/20-embedding/quick-start.md +++ b/src/20-embedding/quick-start.md @@ -1,209 +1 @@ -# Quick Start - Deploy in 5 Minutes - -Get General Bots running on your embedded device with local AI in just a few commands. - -## Prerequisites - -- An SBC (Raspberry Pi, Orange Pi, etc.) with Armbian/Raspbian -- SSH access to the device -- Internet connection (for initial setup only) - -## One-Line Deploy - -From your development machine: - -```bash -# Clone and run the deployment script -git clone https://github.com/GeneralBots/botserver.git -cd botserver - -# Deploy to Orange Pi (replace with your device IP) -./scripts/deploy-embedded.sh orangepi@192.168.1.100 --with-ui --with-llama -``` - -That's it! After ~10-15 minutes: -- BotServer runs on port 8088 -- llama.cpp runs on port 8080 with TinyLlama -- Embedded UI available at `http://your-device:8088/embedded/` - -## Step-by-Step Guide - -### Step 1: Prepare Your Device - -Flash your SBC with a compatible OS: - -**Raspberry Pi:** -```bash -# Download Raspberry Pi Imager -# Select: Raspberry Pi OS Lite (64-bit) -# Enable SSH in settings -``` - -**Orange Pi:** -```bash -# Download Armbian from armbian.com -# Flash with balenaEtcher -``` - -### Step 2: First Boot Configuration - -```bash -# SSH into your device -ssh pi@raspberrypi.local # or orangepi@orangepi.local - -# Update system -sudo apt update && sudo apt upgrade -y - -# Set timezone -sudo timedatectl set-timezone America/Sao_Paulo - -# Enable I2C/SPI if using GPIO displays -sudo raspi-config # or armbian-config -``` - -### Step 3: Run Deployment Script - -From your development PC: - -```bash -# Basic deployment (botserver only) -./scripts/deploy-embedded.sh pi@raspberrypi.local - -# With embedded UI -./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui - -# With local LLM (requires 4GB+ RAM) -./scripts/deploy-embedded.sh pi@raspberrypi.local --with-ui --with-llama - -# Specify a different model -./scripts/deploy-embedded.sh pi@raspberrypi.local --with-llama --model phi-2-Q4_K_M.gguf -``` - -### Step 4: Verify Installation - -```bash -# Check services -ssh pi@raspberrypi.local 'sudo systemctl status botserver' -ssh pi@raspberrypi.local 'sudo systemctl status llama-server' - -# Test botserver -curl http://raspberrypi.local:8088/health - -# Test llama.cpp -curl http://raspberrypi.local:8080/v1/models -``` - -### Step 5: Access the Interface - -Open in your browser: -``` -http://raspberrypi.local:8088/embedded/ -``` - -Or set up kiosk mode (auto-starts on boot): -```bash -# Already configured if you used --with-ui -# Just reboot: -ssh pi@raspberrypi.local 'sudo reboot' -``` - -## Local Installation (On the Device) - -If you prefer to install directly on the device: - -```bash -# SSH into the device -ssh pi@raspberrypi.local - -# Install Rust -curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -source ~/.cargo/env - -# Clone and build -git clone https://github.com/GeneralBots/botserver.git -cd botserver - -# Run local deployment -./scripts/deploy-embedded.sh --local --with-ui --with-llama -``` - -⚠️ **Note:** Building on ARM devices is slow (1-2 hours). Cross-compilation is faster. - -## Configuration - -After deployment, edit the config file: - -```bash -ssh pi@raspberrypi.local -sudo nano /opt/botserver/.env -``` - -Key settings: -```env -# Server -HOST=0.0.0.0 -PORT=8088 - -# Local LLM -LLM_PROVIDER=llamacpp -LLM_API_URL=http://127.0.0.1:8080 -LLM_MODEL=tinyllama - -# Memory limits for small devices -MAX_CONTEXT_TOKENS=2048 -MAX_RESPONSE_TOKENS=512 -``` - -Restart after changes: -```bash -sudo systemctl restart botserver -``` - -## Troubleshooting - -### Out of Memory - -```bash -# Check memory usage -free -h - -# Reduce llama.cpp context -sudo nano /etc/systemd/system/llama-server.service -# Change -c 2048 to -c 1024 - -# Or use a smaller model -# TinyLlama uses ~700MB, Phi-2 uses ~1.6GB -``` - -### Service Won't Start - -```bash -# Check logs -sudo journalctl -u botserver -f -sudo journalctl -u llama-server -f - -# Common issues: -# - Port already in use -# - Missing model file -# - Database permissions -``` - -### Display Not Working - -```bash -# Check if display is detected -ls /dev/fb* # HDMI/DSI -ls /dev/i2c* # I2C displays -ls /dev/spidev* # SPI displays - -# For HDMI, check config -sudo nano /boot/config.txt # Raspberry Pi -sudo nano /boot/armbianEnv.txt # Orange Pi -``` - -## Next Steps - -- [Embedded UI Guide](./embedded-ui.md) - Customize the interface -- [Local LLM Configuration](./local-llm.md) - Optimize AI performance -- [Kiosk Mode](./kiosk-mode.md) - Production deployment -- [Offline Operation](./offline.md) - Disconnected environments +# Quick Start diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 5c52ab4e..02663eba 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -320,9 +320,17 @@ - [Permissions Matrix](./12-auth/permissions-matrix.md) - [User Context vs System Context](./12-auth/user-system-context.md) -# Part XII - Community +# Part XII - Device & Offline Deployment -- [Chapter 13: Contributing](./13-community/README.md) +- [Chapter 13: Device Deployment](./13-devices/README.md) + - [Mobile (Android & HarmonyOS)](./13-devices/mobile.md) + - [Supported Hardware (SBCs)](./13-devices/hardware.md) + - [Quick Start](./13-devices/quick-start.md) + - [Local LLM with llama.cpp](./13-devices/local-llm.md) + +# Part XIII - Community + +- [Chapter 14: Contributing](./13-community/README.md) - [Development Setup](./13-community/setup.md) - [Testing Guide](./13-community/testing.md) - [Documentation](./13-community/documentation.md) @@ -330,9 +338,9 @@ - [Community Guidelines](./13-community/community.md) - [IDEs](./13-community/ide-extensions.md) -# Part XIII - Migration +# Part XIV - Migration -- [Chapter 14: Migration Guide](./14-migration/README.md) +- [Chapter 15: Migration Guide](./14-migration/README.md) - [Migration Overview](./14-migration/overview.md) - [Platform Comparison Matrix](./14-migration/comparison-matrix.md) - [Migration Resources](./14-migration/resources.md) @@ -350,9 +358,9 @@ - [Automation Migration](./14-migration/automation.md) - [Validation and Testing](./14-migration/validation.md) -# Part XIV - Testing +# Part XV - Testing -- [Chapter 17: Testing](./17-testing/README.md) +- [Chapter 16: Testing](./17-testing/README.md) - [End-to-End Testing](./17-testing/e2e-testing.md) - [Testing Architecture](./17-testing/architecture.md) - [Performance Testing](./17-testing/performance.md) @@ -390,12 +398,5 @@ - [Appendix D: Documentation Style](./16-appendix-docs-style/conversation-examples.md) - [SVG and Conversation Standards](./16-appendix-docs-style/svg.md) -# Part XV - Embedded & Offline - -- [Chapter 20: Embedded Deployment](./20-embedding/README.md) - - [Supported Hardware](./20-embedding/hardware.md) - - [Quick Start](./20-embedding/quick-start.md) - - [Local LLM with llama.cpp](./20-embedding/local-llm.md) - [Glossary](./glossary.md) [Contact](./contact/README.md)