From 284d78c5852f7d621c9b60ff818eecaec10b8ac6 Mon Sep 17 00:00:00 2001 From: "Rodrigo Rodriguez (Pragmatismo)" Date: Wed, 10 Dec 2025 12:55:05 -0300 Subject: [PATCH] Add Chapter 19: Maintenance and Updates documentation - Create maintenance chapter with component update guides - Add updating-components.md with step-by-step procedures for all stack components - Add component-reference.md with versions, URLs, checksums, and alternatives for each service - Add security-auditing.md with cargo audit, CVE monitoring, Trivy/Grype scanning - Add backup-recovery.md with full backup/restore procedures - Add troubleshooting.md for common issues and solutions - Update SUMMARY.md with new chapter entry --- src/19-maintenance/README.md | 78 +++ src/19-maintenance/backup-recovery.md | 449 +++++++++++++++++ src/19-maintenance/component-reference.md | 501 +++++++++++++++++++ src/19-maintenance/security-auditing.md | 427 ++++++++++++++++ src/19-maintenance/troubleshooting.md | 576 ++++++++++++++++++++++ src/19-maintenance/updating-components.md | 552 +++++++++++++++++++++ src/SUMMARY.md | 6 + 7 files changed, 2589 insertions(+) create mode 100644 src/19-maintenance/README.md create mode 100644 src/19-maintenance/backup-recovery.md create mode 100644 src/19-maintenance/component-reference.md create mode 100644 src/19-maintenance/security-auditing.md create mode 100644 src/19-maintenance/troubleshooting.md create mode 100644 src/19-maintenance/updating-components.md diff --git a/src/19-maintenance/README.md b/src/19-maintenance/README.md new file mode 100644 index 00000000..e3a06c28 --- /dev/null +++ b/src/19-maintenance/README.md @@ -0,0 +1,78 @@ +# Chapter 19: Maintenance and Updates + +BotServer includes a complete stack of self-hosted services that power your bots. This chapter covers how to maintain, update, and troubleshoot these components. + +## Stack Components Overview + +BotServer automatically installs and manages these services: + +| Component | Service | Default Port | Purpose | +|-----------|---------|--------------|---------| +| **vault** | HashiCorp Vault | 8200 | Secrets management | +| **tables** | PostgreSQL | 5432 | Primary database | +| **directory** | Zitadel | 8080 | Identity & access management | +| **drive** | MinIO | 9000, 9001 | Object storage (S3-compatible) | +| **cache** | Valkey | 6379 | In-memory cache (Redis-compatible) | +| **llm** | llama.cpp | 8081, 8082 | Local LLM & embedding server | +| **email** | Stalwart | 25, 993 | Mail server | +| **proxy** | Caddy | 443, 80 | HTTPS reverse proxy | +| **dns** | CoreDNS | 53 | Local DNS resolution | +| **alm** | Forgejo | 3000 | Git repository (ALM) | +| **alm_ci** | Forgejo Runner | - | CI/CD runner | +| **meeting** | LiveKit | 7880 | Video conferencing | + +## Directory Structure + +``` +botserver-stack/ +├── bin/ # Service binaries +│ ├── vault/ +│ ├── tables/ +│ ├── drive/ +│ ├── cache/ +│ ├── llm/ +│ └── ... +├── conf/ # Configuration files +├── data/ # Persistent data +└── logs/ # Service logs + +botserver-installers/ # Downloaded archives (cache) +``` + +## Why Self-Hosted? + +1. **Privacy** - Data never leaves your infrastructure +2. **Offline** - Works without internet after initial setup +3. **Cost** - No per-user or API fees +4. **Control** - Full access to all services +5. **Compliance** - Meet data residency requirements + +## Chapter Contents + +- [Updating Components](./updating-components.md) - How to update individual services +- [Component Reference](./component-reference.md) - Detailed info for each component +- [Security Auditing](./security-auditing.md) - Running security audits +- [Backup and Recovery](./backup-recovery.md) - Data protection strategies +- [Troubleshooting](./troubleshooting.md) - Common issues and solutions + +## Quick Commands + +```bash +# Check service status +./botserver status + +# View logs +tail -f botserver-stack/logs/llm.log + +# Restart all services +./botserver restart + +# Update a specific component +./botserver update llm +``` + +## Related Documentation + +- [Installation](../01-introduction/installation.md) - Initial setup +- [Secrets Management](../08-config/secrets-management.md) - Vault configuration +- [LLM Configuration](../08-config/llm-config.md) - AI model settings \ No newline at end of file diff --git a/src/19-maintenance/backup-recovery.md b/src/19-maintenance/backup-recovery.md new file mode 100644 index 00000000..b6a47b69 --- /dev/null +++ b/src/19-maintenance/backup-recovery.md @@ -0,0 +1,449 @@ +# Backup and Recovery + +Protecting your BotServer data requires regular backups of databases, configurations, and file storage. This guide covers backup strategies, procedures, and disaster recovery. + +--- + +## What to Backup + +| Component | Data Location | Priority | Method | +|-----------|---------------|----------|--------| +| PostgreSQL | `botserver-stack/data/tables/` | **Critical** | pg_dump | +| Vault | `botserver-stack/data/vault/` | **Critical** | Vault snapshot | +| MinIO | `botserver-stack/data/drive/` | **Critical** | mc mirror | +| Configurations | `botserver-stack/conf/` | High | File copy | +| Bot Packages | S3 buckets (*.gbai) | High | mc mirror | +| Models | `botserver-stack/data/llm/` | Medium | File copy | +| Logs | `botserver-stack/logs/` | Low | Optional | + +--- + +## Quick Backup Commands + +```bash +# Full backup (all components) +./botserver backup + +# Backup specific component +./botserver backup tables +./botserver backup drive +./botserver backup vault + +# Backup to specific location +./botserver backup --output /mnt/backup/$(date +%Y%m%d) +``` + +--- + +## Database Backup (PostgreSQL) + +### Full Database Dump + +```bash +# Using pg_dump +pg_dump $DATABASE_URL > backup-$(date +%Y%m%d-%H%M%S).sql + +# Compressed backup +pg_dump $DATABASE_URL | gzip > backup-$(date +%Y%m%d).sql.gz + +# Custom format (faster restore) +pg_dump -Fc $DATABASE_URL > backup-$(date +%Y%m%d).dump +``` + +### Incremental Backups with WAL + +Enable WAL archiving in `postgresql.conf`: + +```ini +wal_level = replica +archive_mode = on +archive_command = 'cp %p /backup/wal/%f' +``` + +### Automated Database Backup Script + +```bash +#!/bin/bash +# backup-database.sh + +BACKUP_DIR="/backup/postgres" +RETENTION_DAYS=30 +DATE=$(date +%Y%m%d-%H%M%S) + +mkdir -p $BACKUP_DIR + +# Create backup +pg_dump -Fc $DATABASE_URL > "$BACKUP_DIR/botserver-$DATE.dump" + +# Remove old backups +find $BACKUP_DIR -name "*.dump" -mtime +$RETENTION_DAYS -delete + +echo "Backup complete: botserver-$DATE.dump" +``` + +### Database Restore + +```bash +# From SQL dump +psql $DATABASE_URL < backup.sql + +# From custom format (faster) +pg_restore -d $DATABASE_URL backup.dump + +# Drop and recreate (clean restore) +pg_restore -c -d $DATABASE_URL backup.dump +``` + +--- + +## Vault Backup + +### Snapshot Method + +```bash +# Create Vault snapshot +VAULT_ADDR=http://localhost:8200 vault operator raft snapshot save vault-backup-$(date +%Y%m%d).snap +``` + +### File-Based Backup + +```bash +# Stop Vault first +./botserver stop vault + +# Copy data directory +tar -czvf vault-data-$(date +%Y%m%d).tar.gz botserver-stack/data/vault/ + +# Copy unseal keys (store securely!) +cp botserver-stack/conf/vault/init.json /secure/location/ +``` + +### Vault Restore + +```bash +# Stop Vault +./botserver stop vault + +# Restore data +rm -rf botserver-stack/data/vault/* +tar -xzvf vault-data-backup.tar.gz -C botserver-stack/data/ + +# Start and unseal +./botserver start vault +./botserver unseal +``` + +**Warning:** Keep `init.json` (unseal keys and root token) in a secure, separate location! + +--- + +## Object Storage Backup (MinIO) + +### Using MinIO Client (mc) + +```bash +# Configure mc +mc alias set local http://localhost:9000 $DRIVE_ACCESS_KEY $DRIVE_SECRET_KEY + +# Backup all buckets +mc mirror local/ /backup/minio/ + +# Backup specific bot +mc mirror local/mybot.gbai /backup/bots/mybot.gbai/ +``` + +### Sync to Remote Storage + +```bash +# Backup to S3 +mc mirror local/ s3/botserver-backup/ + +# Backup to Backblaze B2 +mc mirror local/ b2/botserver-backup/ + +# Backup to another MinIO +mc mirror local/ remote/botserver-backup/ +``` + +### Restore from Backup + +```bash +# Restore all buckets +mc mirror /backup/minio/ local/ + +# Restore specific bucket +mc mirror /backup/bots/mybot.gbai/ local/mybot.gbai/ +``` + +--- + +## Configuration Backup + +### Full Configuration Backup + +```bash +# Backup all configs +tar -czvf config-backup-$(date +%Y%m%d).tar.gz \ + botserver-stack/conf/ \ + 3rdparty.toml \ + .env + +# Exclude certificates (backup separately with encryption) +tar -czvf config-backup-$(date +%Y%m%d).tar.gz \ + --exclude='certificates' \ + botserver-stack/conf/ +``` + +### Certificate Backup (Encrypted) + +```bash +# Backup certificates with encryption +tar -cz botserver-stack/conf/system/certificates/ | \ + gpg --symmetric --cipher-algo AES256 > certs-backup.tar.gz.gpg +``` + +### Restore Configuration + +```bash +# Restore configs +tar -xzvf config-backup.tar.gz + +# Restore encrypted certificates +gpg --decrypt certs-backup.tar.gz.gpg | tar -xz +``` + +--- + +## Full System Backup + +### Complete Backup Script + +```bash +#!/bin/bash +# full-backup.sh + +set -e + +BACKUP_DIR="/backup/botserver/$(date +%Y%m%d)" +mkdir -p "$BACKUP_DIR" + +echo "Starting full backup to $BACKUP_DIR" + +# 1. Database +echo "Backing up database..." +pg_dump -Fc $DATABASE_URL > "$BACKUP_DIR/database.dump" + +# 2. Vault snapshot +echo "Backing up Vault..." +VAULT_ADDR=http://localhost:8200 vault operator raft snapshot save "$BACKUP_DIR/vault.snap" 2>/dev/null || \ + tar -czvf "$BACKUP_DIR/vault-data.tar.gz" botserver-stack/data/vault/ + +# 3. Object storage +echo "Backing up drive..." +mc mirror local/ "$BACKUP_DIR/drive/" --quiet + +# 4. Configurations +echo "Backing up configurations..." +tar -czvf "$BACKUP_DIR/config.tar.gz" \ + botserver-stack/conf/ \ + 3rdparty.toml \ + .env \ + config/ + +# 5. Models (optional, large files) +if [ "$1" == "--include-models" ]; then + echo "Backing up models..." + tar -czvf "$BACKUP_DIR/models.tar.gz" botserver-stack/data/llm/ +fi + +# Create manifest +echo "Creating manifest..." +cat > "$BACKUP_DIR/manifest.txt" << EOF +BotServer Backup +Date: $(date) +Host: $(hostname) + +Contents: +- database.dump: PostgreSQL database +- vault.snap: Vault secrets +- drive/: Object storage contents +- config.tar.gz: Configuration files +EOF + +echo "Backup complete: $BACKUP_DIR" +du -sh "$BACKUP_DIR" +``` + +### Scheduled Backups + +Add to crontab: + +```bash +# Daily database backup at 2 AM +0 2 * * * /opt/botserver/scripts/backup-database.sh + +# Weekly full backup on Sunday at 3 AM +0 3 * * 0 /opt/botserver/scripts/full-backup.sh + +# Monthly backup with models +0 4 1 * * /opt/botserver/scripts/full-backup.sh --include-models +``` + +--- + +## Disaster Recovery + +### Recovery Procedure + +1. **Install fresh BotServer** + ```bash + ./botserver --skip-bootstrap + ``` + +2. **Restore configurations** + ```bash + tar -xzvf config-backup.tar.gz + ``` + +3. **Restore Vault** + ```bash + tar -xzvf vault-data.tar.gz + ./botserver start vault + ./botserver unseal + ``` + +4. **Restore database** + ```bash + ./botserver start tables + pg_restore -d $DATABASE_URL database.dump + ``` + +5. **Restore object storage** + ```bash + ./botserver start drive + mc mirror /backup/drive/ local/ + ``` + +6. **Start remaining services** + ```bash + ./botserver start + ``` + +7. **Verify** + ```bash + ./botserver status + ./botserver test + ``` + +### Recovery Time Objectives + +| Scenario | RTO Target | Method | +|----------|------------|--------| +| Single component failure | < 15 min | Restart/restore component | +| Database corruption | < 1 hour | pg_restore from backup | +| Full server failure | < 4 hours | Full restore procedure | +| Data center failure | < 24 hours | Geo-replicated restore | + +--- + +## Backup Verification + +### Test Restore Regularly + +```bash +# Restore to test environment +./botserver --test-restore /backup/latest/ + +# Verify database integrity +pg_restore --list database.dump +psql $DATABASE_URL -c "SELECT COUNT(*) FROM bots;" + +# Verify drive contents +mc ls local/ +``` + +### Backup Integrity Checks + +```bash +# Verify backup file integrity +sha256sum /backup/*/database.dump > /backup/checksums.txt + +# Verify on restore +sha256sum -c /backup/checksums.txt +``` + +--- + +## Cloud Backup Integration + +### AWS S3 + +```bash +# Configure AWS CLI +aws configure + +# Sync backups to S3 +aws s3 sync /backup/botserver/ s3://my-backup-bucket/botserver/ + +# Enable versioning for point-in-time recovery +aws s3api put-bucket-versioning \ + --bucket my-backup-bucket \ + --versioning-configuration Status=Enabled +``` + +### Backblaze B2 + +```bash +# Configure rclone +rclone config + +# Sync backups +rclone sync /backup/botserver/ b2:my-backup-bucket/botserver/ +``` + +### Encrypted Remote Backup + +```bash +# Encrypt before upload +tar -cz /backup/botserver/ | \ + gpg --symmetric --cipher-algo AES256 | \ + aws s3 cp - s3://my-backup-bucket/botserver-$(date +%Y%m%d).tar.gz.gpg +``` + +--- + +## Retention Policy + +| Backup Type | Retention | Storage | +|-------------|-----------|---------| +| Hourly snapshots | 24 hours | Local | +| Daily backups | 30 days | Local + Remote | +| Weekly backups | 12 weeks | Remote | +| Monthly backups | 12 months | Remote (cold) | +| Yearly backups | 7 years | Archive | + +### Cleanup Script + +```bash +#!/bin/bash +# cleanup-backups.sh + +BACKUP_DIR="/backup/botserver" + +# Remove daily backups older than 30 days +find $BACKUP_DIR/daily -mtime +30 -delete + +# Remove weekly backups older than 12 weeks +find $BACKUP_DIR/weekly -mtime +84 -delete + +# Remove monthly backups older than 12 months +find $BACKUP_DIR/monthly -mtime +365 -delete +``` + +--- + +## See Also + +- [Updating Components](./updating-components.md) - Safe update procedures +- [Troubleshooting](./troubleshooting.md) - Recovery from common issues +- [Security Auditing](./security-auditing.md) - Protecting backup data \ No newline at end of file diff --git a/src/19-maintenance/component-reference.md b/src/19-maintenance/component-reference.md new file mode 100644 index 00000000..7c39bfd9 --- /dev/null +++ b/src/19-maintenance/component-reference.md @@ -0,0 +1,501 @@ +# Component Reference + +This reference provides detailed information about each component in the BotServer stack, including current versions, alternatives, and configuration options. + +--- + +## Core Components + +### Vault (Secrets Management) + +| Property | Value | +|----------|-------| +| **Service** | HashiCorp Vault | +| **Current Version** | 1.15.4 | +| **Default Port** | 8200 | +| **Binary Path** | `botserver-stack/bin/vault/vault` | +| **Config Path** | `botserver-stack/conf/vault/` | +| **Data Path** | `botserver-stack/data/vault/` | +| **Log File** | `botserver-stack/logs/vault.log` | + +**Download URL:** +``` +https://releases.hashicorp.com/vault/1.15.4/vault_1.15.4_linux_amd64.zip +``` + +**Purpose:** +- Stores all service credentials (database, drive, cache) +- Manages encryption keys +- Provides secrets rotation +- Issues short-lived tokens + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [OpenBao](https://openbao.org/) | MPL-2.0 | Fork of Vault, fully open source | +| [Infisical](https://infisical.com/) | MIT | Modern secrets management | +| [SOPS](https://github.com/getsops/sops) | MPL-2.0 | File-based encryption | +| [Doppler](https://doppler.com/) | Proprietary | Cloud-based alternative | + +--- + +### PostgreSQL (Tables/Database) + +| Property | Value | +|----------|-------| +| **Service** | PostgreSQL | +| **Current Version** | 17.2.0 | +| **Default Port** | 5432 | +| **Binary Path** | `botserver-stack/bin/tables/` | +| **Config Path** | `botserver-stack/conf/tables/` | +| **Data Path** | `botserver-stack/data/tables/` | +| **Log File** | `botserver-stack/logs/postgres.log` | + +**Download URL:** +``` +https://github.com/theseus-rs/postgresql-binaries/releases/download/17.2.0/postgresql-17.2.0-x86_64-unknown-linux-gnu.tar.gz +``` + +**Purpose:** +- Primary relational database +- Stores bot configurations, users, conversations +- Supports full-text search +- Handles transactions and ACID compliance + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [CockroachDB](https://www.cockroachlabs.com/) | BSL/CCL | Distributed SQL, PostgreSQL-compatible | +| [YugabyteDB](https://www.yugabyte.com/) | Apache-2.0 | Distributed PostgreSQL | +| [Neon](https://neon.tech/) | Apache-2.0 | Serverless PostgreSQL | +| [Supabase](https://supabase.com/) | Apache-2.0 | PostgreSQL with extras | + +--- + +### Zitadel (Directory/Identity) + +| Property | Value | +|----------|-------| +| **Service** | Zitadel | +| **Current Version** | 2.70.4 | +| **Default Port** | 8080 | +| **Binary Path** | `botserver-stack/bin/directory/zitadel` | +| **Config Path** | `botserver-stack/conf/directory/` | +| **Data Path** | Uses PostgreSQL | +| **Log File** | `botserver-stack/logs/zitadel.log` | + +**Download URL:** +``` +https://github.com/zitadel/zitadel/releases/download/v2.70.4/zitadel-linux-amd64.tar.gz +``` + +**Purpose:** +- User authentication and authorization +- OAuth2/OIDC provider +- Single Sign-On (SSO) +- Multi-factor authentication +- Service credential provisioning + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [Keycloak](https://www.keycloak.org/) | Apache-2.0 | Java-based, feature-rich | +| [Authentik](https://goauthentik.io/) | Custom OSS | Python-based, modern UI | +| [Authelia](https://www.authelia.com/) | Apache-2.0 | Lightweight, Nginx integration | +| [Ory](https://www.ory.sh/) | Apache-2.0 | Modular identity infrastructure | +| [Casdoor](https://casdoor.org/) | Apache-2.0 | Go-based, UI-focused | + +--- + +### MinIO (Drive/Object Storage) + +| Property | Value | +|----------|-------| +| **Service** | MinIO | +| **Current Version** | Latest | +| **Default Ports** | 9000 (API), 9001 (Console) | +| **Binary Path** | `botserver-stack/bin/drive/minio` | +| **Config Path** | `botserver-stack/conf/drive/` | +| **Data Path** | `botserver-stack/data/drive/` | +| **Log File** | `botserver-stack/logs/minio.log` | + +**Download URL:** +``` +https://dl.min.io/server/minio/release/linux-amd64/minio +``` + +**Purpose:** +- S3-compatible object storage +- Stores bot packages (.gbai, .gbkb, etc.) +- File uploads and downloads +- Static asset hosting + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [SeaweedFS](https://github.com/seaweedfs/seaweedfs) | Apache-2.0 | Distributed, fast | +| [Garage](https://garagehq.deuxfleurs.fr/) | AGPL-3.0 | Lightweight, geo-distributed | +| [Ceph](https://ceph.io/) | LGPL-2.1 | Enterprise-grade, complex | +| [LakeFS](https://lakefs.io/) | Apache-2.0 | Git-like versioning for data | + +--- + +### Valkey (Cache) + +| Property | Value | +|----------|-------| +| **Service** | Valkey | +| **Current Version** | 8.0.2 | +| **Default Port** | 6379 | +| **Binary Path** | `botserver-stack/bin/cache/valkey-server` | +| **Config Path** | `botserver-stack/conf/cache/` | +| **Data Path** | `botserver-stack/data/cache/` | +| **Log File** | `botserver-stack/logs/valkey.log` | + +**Download URL:** +``` +https://github.com/valkey-io/valkey/archive/refs/tags/8.0.2.tar.gz +``` + +**Note:** Valkey requires compilation from source. Build dependencies: `gcc`, `make` + +**Purpose:** +- In-memory caching +- Session storage +- Rate limiting +- Pub/Sub messaging +- Queue management + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [KeyDB](https://docs.keydb.dev/) | BSD-3 | Multi-threaded Redis fork | +| [Dragonfly](https://www.dragonflydb.io/) | BSL | High-performance, Redis-compatible | +| [Garnet](https://github.com/microsoft/garnet) | MIT | Microsoft's cache store | +| [Skytable](https://skytable.io/) | AGPL-3.0 | Modern NoSQL | + +--- + +### llama.cpp (LLM Server) + +| Property | Value | +|----------|-------| +| **Service** | llama.cpp | +| **Current Version** | b7345 | +| **Default Ports** | 8081 (LLM), 8082 (Embedding) | +| **Binary Path** | `botserver-stack/bin/llm/llama-server` | +| **Config Path** | `botserver-stack/conf/llm/` | +| **Data Path** | `botserver-stack/data/llm/` (models) | +| **Log File** | `botserver-stack/logs/llm.log` | + +**Download URLs by Platform:** + +| Platform | URL | +|----------|-----| +| Linux x64 | `https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-ubuntu-x64.zip` | +| Linux x64 Vulkan | `https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-ubuntu-vulkan-x64.zip` | +| macOS ARM64 | `https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-macos-arm64.zip` | +| macOS x64 | `https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-macos-x64.zip` | +| Windows x64 | `https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-win-cpu-x64.zip` | +| Windows CUDA 12 | `https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-win-cuda-12.4-x64.zip` | +| Windows CUDA 13 | `https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-win-cuda-13.1-x64.zip` | + +**SHA256 Checksums:** +``` +llama-b7345-bin-ubuntu-x64.zip: 91b066ecc53c20693a2d39703c12bc7a69c804b0768fee064d47df702f616e52 +llama-b7345-bin-ubuntu-vulkan-x64.zip: 03f0b3acbead2ddc23267073a8f8e0207937c849d3704c46c61cf167c1001442 +llama-b7345-bin-macos-arm64.zip: 72ae9b4a4605aa1223d7aabaa5326c66c268b12d13a449fcc06f61099cd02a52 +llama-b7345-bin-macos-x64.zip: bec6b805cf7533f66b38f29305429f521dcb2be6b25dbce73a18df448ec55cc5 +llama-b7345-bin-win-cpu-x64.zip: ea449082c8e808a289d9a1e8331f90a0379ead4dd288a1b9a2d2c0a7151836cd +llama-b7345-bin-win-cuda-12.4-x64.zip: 7a82aba2662fa7d4477a7a40894de002854bae1ab8b0039888577c9a2ca24cae +llama-b7345-bin-win-cuda-13.1-x64.zip: 06ea715cefb07e9862394e6d1ffa066f4c33add536b1f1aa058723f86ae05572 +``` + +**Purpose:** +- Local LLM inference +- Text embeddings for semantic search +- OpenAI-compatible API +- Supports GGUF model format + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [Ollama](https://ollama.ai/) | MIT | User-friendly, model management | +| [vLLM](https://github.com/vllm-project/vllm) | Apache-2.0 | High throughput, production-grade | +| [Text Generation Inference](https://github.com/huggingface/text-generation-inference) | Apache-2.0 | HuggingFace's solution | +| [LocalAI](https://localai.io/) | MIT | Drop-in OpenAI replacement | +| [LM Studio](https://lmstudio.ai/) | Proprietary | Desktop GUI application | + +--- + +## Supporting Components + +### Stalwart (Email Server) + +| Property | Value | +|----------|-------| +| **Service** | Stalwart Mail Server | +| **Current Version** | 0.10.7 | +| **Default Ports** | 25 (SMTP), 993 (IMAPS), 587 (Submission) | +| **Binary Path** | `botserver-stack/bin/email/stalwart-mail` | +| **Config Path** | `botserver-stack/conf/email/` | +| **Data Path** | `botserver-stack/data/email/` | +| **Log File** | `botserver-stack/logs/stalwart.log` | + +**Download URL:** +``` +https://github.com/stalwartlabs/mail-server/releases/download/v0.10.7/stalwart-mail-x86_64-linux.tar.gz +``` + +**Purpose:** +- Full email server (SMTP, IMAP, JMAP) +- Email sending and receiving +- Spam filtering +- DKIM/SPF/DMARC support + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [Maddy](https://maddy.email/) | GPL-3.0 | Composable mail server | +| [Mail-in-a-Box](https://mailinabox.email/) | CC0 | All-in-one solution | +| [Postal](https://postalserver.io/) | MIT | Sending-focused | +| [Haraka](https://haraka.github.io/) | MIT | Node.js SMTP | + +--- + +### Caddy (Proxy) + +| Property | Value | +|----------|-------| +| **Service** | Caddy | +| **Current Version** | 2.9.1 | +| **Default Ports** | 443 (HTTPS), 80 (HTTP) | +| **Binary Path** | `botserver-stack/bin/proxy/caddy` | +| **Config Path** | `botserver-stack/conf/proxy/Caddyfile` | +| **Data Path** | `botserver-stack/data/proxy/` | +| **Log File** | `botserver-stack/logs/caddy.log` | + +**Download URL:** +``` +https://github.com/caddyserver/caddy/releases/download/v2.9.1/caddy_2.9.1_linux_amd64.tar.gz +``` + +**Purpose:** +- Automatic HTTPS with Let's Encrypt +- Reverse proxy for all services +- Load balancing +- HTTP/2 and HTTP/3 support + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [Nginx](https://nginx.org/) | BSD-2 | Industry standard | +| [Traefik](https://traefik.io/) | MIT | Cloud-native, auto-discovery | +| [HAProxy](https://www.haproxy.org/) | GPL-2.0 | High performance | +| [Envoy](https://www.envoyproxy.io/) | Apache-2.0 | Service mesh ready | + +--- + +### CoreDNS (DNS) + +| Property | Value | +|----------|-------| +| **Service** | CoreDNS | +| **Current Version** | 1.11.1 | +| **Default Port** | 53 | +| **Binary Path** | `botserver-stack/bin/dns/coredns` | +| **Config Path** | `botserver-stack/conf/dns/Corefile` | +| **Log File** | `botserver-stack/logs/coredns.log` | + +**Download URL:** +``` +https://github.com/coredns/coredns/releases/download/v1.11.1/coredns_1.11.1_linux_amd64.tgz +``` + +**Purpose:** +- Local DNS resolution +- Service discovery (*.botserver.local) +- DNS-based load balancing + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [PowerDNS](https://www.powerdns.com/) | GPL-2.0 | Feature-rich, authoritative | +| [Unbound](https://nlnetlabs.nl/projects/unbound/) | BSD | Validating resolver | +| [dnsmasq](https://thekelleys.org.uk/dnsmasq/doc.html) | GPL-2.0 | Lightweight | + +--- + +### Forgejo (ALM/Git) + +| Property | Value | +|----------|-------| +| **Service** | Forgejo | +| **Current Version** | 10.0.2 | +| **Default Port** | 3000 | +| **Binary Path** | `botserver-stack/bin/alm/forgejo` | +| **Config Path** | `botserver-stack/conf/alm/` | +| **Data Path** | `botserver-stack/data/alm/` | +| **Log File** | `botserver-stack/logs/forgejo.log` | + +**Download URL:** +``` +https://codeberg.org/forgejo/forgejo/releases/download/v10.0.2/forgejo-10.0.2-linux-amd64 +``` + +**Purpose:** +- Git repository hosting +- Issue tracking +- CI/CD pipelines +- Code review + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [Gitea](https://gitea.io/) | MIT | Original project | +| [GitLab](https://gitlab.com/) | MIT (CE) | Full DevOps platform | +| [Gogs](https://gogs.io/) | MIT | Lightweight | +| [OneDev](https://onedev.io/) | MIT | Built-in CI/CD | + +--- + +### LiveKit (Meeting/Video) + +| Property | Value | +|----------|-------| +| **Service** | LiveKit | +| **Current Version** | 2.8.2 | +| **Default Ports** | 7880 (HTTP), 7881 (RTC) | +| **Binary Path** | `botserver-stack/bin/meeting/livekit-server` | +| **Config Path** | `botserver-stack/conf/meeting/` | +| **Log File** | `botserver-stack/logs/livekit.log` | + +**Download URL:** +``` +https://github.com/livekit/livekit/releases/download/v2.8.2/livekit_2.8.2_linux_amd64.tar.gz +``` + +**Purpose:** +- Real-time video/audio communication +- WebRTC infrastructure +- Screen sharing +- Recording + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [Jitsi](https://jitsi.org/) | Apache-2.0 | Full-featured, established | +| [BigBlueButton](https://bigbluebutton.org/) | LGPL-3.0 | Education-focused | +| [Janus](https://janus.conf.meetecho.com/) | GPL-3.0 | WebRTC gateway | +| [mediasoup](https://mediasoup.org/) | ISC | Node.js SFU | + +--- + +## Optional Components + +### Qdrant (Vector Database) + +| Property | Value | +|----------|-------| +| **Service** | Qdrant | +| **Current Version** | Latest | +| **Default Ports** | 6333 (HTTP), 6334 (gRPC) | +| **Binary Path** | `botserver-stack/bin/vector_db/qdrant` | + +**Download URL:** +``` +https://github.com/qdrant/qdrant/releases/latest/download/qdrant-x86_64-unknown-linux-gnu.tar.gz +``` + +**Purpose:** +- Vector similarity search +- Knowledge base embeddings +- Semantic search + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [Milvus](https://milvus.io/) | Apache-2.0 | Distributed, scalable | +| [Weaviate](https://weaviate.io/) | BSD-3 | GraphQL API | +| [Chroma](https://www.trychroma.com/) | Apache-2.0 | Simple, embedded | +| [pgvector](https://github.com/pgvector/pgvector) | PostgreSQL | PostgreSQL extension | + +--- + +### InfluxDB (Time Series) + +| Property | Value | +|----------|-------| +| **Service** | InfluxDB | +| **Current Version** | 2.7.5 | +| **Default Port** | 8086 | +| **Binary Path** | `botserver-stack/bin/timeseries_db/influxd` | + +**Download URL:** +``` +https://download.influxdata.com/influxdb/releases/influxdb2-2.7.5-linux-amd64.tar.gz +``` + +**Purpose:** +- Metrics storage +- Time-series analytics +- Monitoring dashboards + +**Alternatives:** +| Alternative | License | Notes | +|-------------|---------|-------| +| [TimescaleDB](https://www.timescale.com/) | Apache-2.0 | PostgreSQL extension | +| [VictoriaMetrics](https://victoriametrics.com/) | Apache-2.0 | Prometheus-compatible | +| [QuestDB](https://questdb.io/) | Apache-2.0 | High-performance SQL | +| [Prometheus](https://prometheus.io/) | Apache-2.0 | Monitoring-focused | + +--- + +## Default LLM Models + +### DeepSeek R1 Distill Qwen 1.5B + +| Property | Value | +|----------|-------| +| **Filename** | `DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf` | +| **Size** | ~1.1 GB | +| **RAM Required** | 4 GB | +| **Use Case** | Default conversational model | + +**Download URL:** +``` +https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf +``` + +### BGE Small EN v1.5 + +| Property | Value | +|----------|-------| +| **Filename** | `bge-small-en-v1.5-f32.gguf` | +| **Size** | ~130 MB | +| **RAM Required** | 512 MB | +| **Use Case** | Text embeddings for semantic search | + +**Download URL:** +``` +https://huggingface.co/CompendiumLabs/bge-small-en-v1.5-gguf/resolve/main/bge-small-en-v1.5-f32.gguf +``` + +--- + +## Configuration Files Reference + +| File | Purpose | +|------|---------| +| `3rdparty.toml` | Component download URLs and checksums | +| `config/llm_releases.json` | Platform-specific LLM builds | +| `botserver-stack/conf/*/` | Per-component configuration | +| `.env` | Environment variables (generated) | + +--- + +## See Also + +- [Updating Components](./updating-components.md) - How to update +- [Security Auditing](./security-auditing.md) - Vulnerability scanning +- [Troubleshooting](./troubleshooting.md) - Common issues \ No newline at end of file diff --git a/src/19-maintenance/security-auditing.md b/src/19-maintenance/security-auditing.md new file mode 100644 index 00000000..ea3224fa --- /dev/null +++ b/src/19-maintenance/security-auditing.md @@ -0,0 +1,427 @@ +# Security Auditing + +Regular security audits ensure your BotServer installation remains protected against known vulnerabilities. This guide covers automated scanning, manual reviews, and best practices. + +--- + +## Rust Dependency Auditing + +### cargo-audit + +BotServer uses `cargo-audit` to scan Rust dependencies for known vulnerabilities. + +**Install cargo-audit:** + +```bash +cargo install cargo-audit +``` + +**Run audit:** + +```bash +cd botserver +cargo audit +``` + +**Expected output (clean):** + +``` + Fetching advisory database from `https://github.com/RustSec/advisory-db` + Loaded 650 security advisories (from ~/.cargo/advisory-db) + Scanning Cargo.lock for vulnerabilities (425 crate dependencies) +``` + +**Output with vulnerabilities:** + +``` +Crate: openssl +Version: 0.10.38 +Title: `openssl` `X509NameRef::entries` is unsound +Date: 2023-11-23 +ID: RUSTSEC-2023-0072 +URL: https://rustsec.org/advisories/RUSTSEC-2023-0072 +Severity: medium +Solution: Upgrade to >=0.10.60 +``` + +### Automated CI/CD Auditing + +Add to your CI pipeline (`.github/workflows/security.yml`): + +```yaml +name: Security Audit + +on: + push: + branches: [main] + pull_request: + schedule: + - cron: '0 0 * * *' # Daily at midnight + +jobs: + audit: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: rustsec/audit-check@v1 + with: + token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Strict Auditing + +Fail on any warning: + +```bash +cargo audit --deny warnings +``` + +Fail on unmaintained crates: + +```bash +cargo audit --deny unmaintained +``` + +Generate JSON report: + +```bash +cargo audit --json > audit-report.json +``` + +--- + +## Stack Component Vulnerabilities + +### CVE Monitoring + +Monitor security advisories for each component: + +| Component | Security Feed | +|-----------|---------------| +| PostgreSQL | [postgresql.org/support/security](https://www.postgresql.org/support/security/) | +| Vault | [security.hashicorp.com](https://www.hashicorp.com/security) | +| MinIO | [github.com/minio/minio/security](https://github.com/minio/minio/security/advisories) | +| Zitadel | [github.com/zitadel/zitadel/security](https://github.com/zitadel/zitadel/security/advisories) | +| llama.cpp | [github.com/ggml-org/llama.cpp/security](https://github.com/ggml-org/llama.cpp/security/advisories) | +| Valkey | [github.com/valkey-io/valkey/security](https://github.com/valkey-io/valkey/security/advisories) | +| Caddy | [github.com/caddyserver/caddy/security](https://github.com/caddyserver/caddy/security/advisories) | +| Stalwart | [github.com/stalwartlabs/mail-server/security](https://github.com/stalwartlabs/mail-server/security/advisories) | + +### Trivy Container Scanning + +If using containers, scan with Trivy: + +```bash +# Install Trivy +curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin + +# Scan filesystem +trivy fs --security-checks vuln,config ./botserver-stack/ + +# Scan specific binary +trivy fs --security-checks vuln ./botserver-stack/bin/vault/ +``` + +### Grype Binary Scanning + +Scan binaries for vulnerabilities: + +```bash +# Install Grype +curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin + +# Scan directory +grype dir:./botserver-stack/bin/ +``` + +--- + +## Network Security Audit + +### Port Scanning + +Verify only expected ports are open: + +```bash +# Local port check +ss -tlnp | grep LISTEN + +# Expected ports +# 8200 - Vault +# 5432 - PostgreSQL +# 8080 - Zitadel / API +# 9000 - MinIO API +# 9001 - MinIO Console +# 6379 - Valkey +# 8081 - LLM Server +# 8082 - Embedding Server +# 443 - HTTPS Proxy +# 53 - DNS +``` + +External port scan: + +```bash +nmap -sT -p- localhost +``` + +### TLS Certificate Audit + +Check certificate validity: + +```bash +# Check expiration +openssl x509 -in botserver-stack/conf/system/certificates/api/server.crt -noout -dates + +# Check certificate chain +openssl verify -CAfile botserver-stack/conf/system/certificates/ca/ca.crt \ + botserver-stack/conf/system/certificates/api/server.crt +``` + +### Firewall Rules + +Ensure proper firewall configuration: + +```bash +# UFW (Ubuntu) +sudo ufw status verbose + +# iptables +sudo iptables -L -n -v +``` + +Recommended rules: + +```bash +# Allow only necessary ports +sudo ufw default deny incoming +sudo ufw default allow outgoing +sudo ufw allow 443/tcp # HTTPS +sudo ufw allow 8080/tcp # API (if exposed) +``` + +--- + +## Secrets Audit + +### Vault Health Check + +```bash +# Check Vault seal status +curl -s http://localhost:8200/v1/sys/seal-status | jq + +# List enabled auth methods +VAULT_ADDR=http://localhost:8200 vault auth list + +# Audit enabled secrets engines +VAULT_ADDR=http://localhost:8200 vault secrets list +``` + +### Environment Variable Audit + +Check for leaked secrets: + +```bash +# Search for hardcoded secrets +grep -r "password" --include="*.toml" --include="*.json" --include="*.csv" . +grep -r "secret" --include="*.toml" --include="*.json" --include="*.csv" . +grep -r "api_key" --include="*.toml" --include="*.json" --include="*.csv" . + +# Check .env file permissions +ls -la .env +# Should be: -rw------- (600) +``` + +### Rotate Secrets + +Regular rotation schedule: + +```bash +# Generate new database password +./botserver rotate-secret tables + +# Generate new drive credentials +./botserver rotate-secret drive + +# Rotate all secrets +./botserver rotate-secrets --all +``` + +--- + +## Code Security Analysis + +### Static Analysis with Clippy + +```bash +# Run Clippy with all lints +cargo clippy -- -W clippy::all -W clippy::pedantic -W clippy::nursery + +# Security-focused lints +cargo clippy -- -W clippy::unwrap_used -W clippy::expect_used +``` + +### SAST with Semgrep + +```bash +# Install Semgrep +pip install semgrep + +# Run Rust security rules +semgrep --config p/rust . + +# Run all security rules +semgrep --config p/security-audit . +``` + +### Dependency Review + +Check for outdated dependencies: + +```bash +# List outdated crates +cargo outdated + +# Check for yanked crates +cargo audit --deny yanked +``` + +--- + +## Database Security + +### PostgreSQL Audit + +```bash +# Check authentication settings +cat botserver-stack/conf/tables/pg_hba.conf + +# Verify SSL is enabled +psql $DATABASE_URL -c "SHOW ssl;" + +# Check user permissions +psql $DATABASE_URL -c "SELECT * FROM pg_roles WHERE rolname NOT LIKE 'pg_%';" +``` + +### Connection Security + +Ensure encrypted connections: + +```sql +-- Check current connections +SELECT datname, usename, ssl, client_addr +FROM pg_stat_ssl +JOIN pg_stat_activity ON pg_stat_ssl.pid = pg_stat_activity.pid; +``` + +--- + +## Compliance Checks + +### OWASP Top 10 + +| Risk | Mitigation | Status Check | +|------|------------|--------------| +| Injection | Parameterized queries | `grep -r "raw_sql" src/` | +| Broken Auth | Zitadel handles auth | Check Zitadel config | +| Sensitive Data | Vault encryption | `vault status` | +| XXE | No XML parsing | N/A | +| Broken Access | RBAC via Zitadel | Check permissions | +| Security Misconfig | Audit configs | Review `conf/` | +| XSS | Template escaping | Askama auto-escapes | +| Insecure Deserialization | Serde validation | Code review | +| Vulnerable Components | `cargo audit` | Automated | +| Logging | Structured logs | Check log config | + +### SOC 2 Checklist + +- [ ] Access controls documented +- [ ] Encryption at rest enabled +- [ ] Encryption in transit (TLS) +- [ ] Audit logging enabled +- [ ] Backup procedures documented +- [ ] Incident response plan +- [ ] Vulnerability management process + +--- + +## Audit Schedule + +| Audit Type | Frequency | Tool | +|------------|-----------|------| +| Dependency vulnerabilities | Daily (CI) | cargo-audit | +| Container scanning | Weekly | Trivy | +| Secret rotation | Monthly | Vault | +| Port scanning | Monthly | nmap | +| Full security review | Quarterly | Manual | +| Penetration testing | Annually | External | + +--- + +## Automated Security Script + +Create `security-audit.sh`: + +```bash +#!/bin/bash +set -e + +echo "=== BotServer Security Audit ===" +echo "Date: $(date)" +echo + +echo "--- Rust Dependency Audit ---" +cargo audit --deny warnings || echo "WARN: Vulnerabilities found" + +echo +echo "--- Checking for Hardcoded Secrets ---" +if grep -r "password.*=" --include="*.rs" src/ 2>/dev/null | grep -v "fn\|let\|//"; then + echo "WARN: Potential hardcoded passwords found" +fi + +echo +echo "--- Port Scan ---" +ss -tlnp | grep LISTEN + +echo +echo "--- Certificate Expiry ---" +for cert in botserver-stack/conf/system/certificates/*/server.crt; do + if [ -f "$cert" ]; then + expiry=$(openssl x509 -in "$cert" -noout -enddate 2>/dev/null | cut -d= -f2) + echo "$cert: $expiry" + fi +done + +echo +echo "--- Vault Status ---" +curl -s http://localhost:8200/v1/sys/seal-status 2>/dev/null | jq -r '.sealed' || echo "Vault not running" + +echo +echo "=== Audit Complete ===" +``` + +Run periodically: + +```bash +chmod +x security-audit.sh +./security-audit.sh > audit-$(date +%Y%m%d).log +``` + +--- + +## Reporting Vulnerabilities + +If you discover a security vulnerability in BotServer: + +1. **Do NOT** create a public GitHub issue +2. Email security@generalbots.ai with details +3. Include steps to reproduce +4. Allow 90 days for fix before disclosure + +--- + +## See Also + +- [Secrets Management](../08-config/secrets-management.md) - Vault configuration +- [Updating Components](./updating-components.md) - Applying security updates +- [Backup and Recovery](./backup-recovery.md) - Data protection \ No newline at end of file diff --git a/src/19-maintenance/troubleshooting.md b/src/19-maintenance/troubleshooting.md new file mode 100644 index 00000000..2c4c976e --- /dev/null +++ b/src/19-maintenance/troubleshooting.md @@ -0,0 +1,576 @@ +# Troubleshooting + +This guide covers common issues you may encounter with BotServer and their solutions. + +--- + +## Quick Diagnostics + +### Check Overall Status + +```bash +# View all service status +./botserver status + +# Check specific service +./botserver status llm +./botserver status tables +./botserver status vault +``` + +### View Logs + +```bash +# All logs +tail -f botserver-stack/logs/*.log + +# Specific service +tail -100 botserver-stack/logs/llm.log +tail -100 botserver-stack/logs/postgres.log +tail -100 botserver-stack/logs/vault.log + +# With filtering +grep -i error botserver-stack/logs/*.log +grep -i "failed\|error\|panic" botserver-stack/logs/*.log +``` + +### System Resources + +```bash +# Memory usage +free -h + +# Disk space +df -h botserver-stack/ + +# Process list +ps aux | grep -E "llama|postgres|minio|vault|valkey" + +# Open ports +ss -tlnp | grep LISTEN +``` + +--- + +## Startup Issues + +### Bootstrap Fails + +**Symptom:** `./botserver` fails during initial setup + +**Common Causes & Solutions:** + +1. **Port already in use** + ```bash + # Find what's using the port + lsof -i :8080 + lsof -i :5432 + + # Kill conflicting process + kill -9 + + # Or change port in config + ``` + +2. **Insufficient disk space** + ```bash + # Check available space + df -h + + # Clean up old installers + rm -rf botserver-installers/*.old + + # Clean logs + rm -f botserver-stack/logs/*.log.old + ``` + +3. **Download failure** + ```bash + # Clear cache and retry + rm -rf botserver-installers/component-name* + ./botserver bootstrap + + # Manual download + curl -L -o botserver-installers/file.zip "URL" + ``` + +4. **Permission denied** + ```bash + # Fix permissions + chmod +x botserver + chmod -R u+rwX botserver-stack/ + ``` + +### Vault Won't Start + +**Symptom:** Vault fails to initialize or unseal + +**Solutions:** + +1. **First-time setup failed** + ```bash + # Reset Vault completely + rm -rf botserver-stack/data/vault/* + rm -f botserver-stack/conf/vault/init.json + ./botserver bootstrap + ``` + +2. **Vault is sealed** + ```bash + # Check seal status + curl http://localhost:8200/v1/sys/seal-status + + # Unseal manually + ./botserver unseal + ``` + +3. **Lost unseal keys** + ```bash + # Check init.json exists + cat botserver-stack/conf/vault/init.json + + # If lost, must reset Vault (DATA LOSS) + ./botserver reset vault + ``` + +### Database Won't Start + +**Symptom:** PostgreSQL fails to start + +**Solutions:** + +1. **Corrupted data directory** + ```bash + # Check PostgreSQL logs + tail -50 botserver-stack/logs/postgres.log + + # Try recovery + ./botserver-stack/bin/tables/bin/pg_resetwal -f botserver-stack/data/tables/ + ``` + +2. **Port conflict** + ```bash + # Check if another PostgreSQL is running + lsof -i :5432 + + # Stop system PostgreSQL + sudo systemctl stop postgresql + ``` + +3. **Incorrect permissions** + ```bash + chmod 700 botserver-stack/data/tables/ + ``` + +--- + +## Service Issues + +### LLM Server Not Responding + +**Symptom:** Requests to port 8081/8082 fail + +**Solutions:** + +1. **Check if running** + ```bash + pgrep llama-server + curl -k https://localhost:8081/health + ``` + +2. **Model not found** + ```bash + # Verify model exists + ls -la botserver-stack/data/llm/ + + # Re-download model + ./botserver update llm + ``` + +3. **Out of memory** + ```bash + # Check memory usage + free -h + + # Use smaller model or reduce context + # Edit config.csv: + # llm-server-ctx-size,2048 + ``` + +4. **GPU issues** + ```bash + # Check CUDA + nvidia-smi + + # Fall back to CPU + # Edit config.csv: + # llm-server-gpu-layers,0 + ``` + +5. **Restart LLM server** + ```bash + pkill llama-server + ./botserver start llm + ``` + +### Drive (MinIO) Issues + +**Symptom:** File uploads/downloads fail + +**Solutions:** + +1. **Check MinIO status** + ```bash + curl http://localhost:9000/minio/health/live + ``` + +2. **Credential issues** + ```bash + # Verify credentials from Vault + ./botserver show-secret drive + + # Test with mc client + mc alias set local http://localhost:9000 ACCESS_KEY SECRET_KEY + mc ls local/ + ``` + +3. **Disk full** + ```bash + df -h botserver-stack/data/drive/ + + # Clean old versions + mc rm --recursive --force local/bucket/.minio.sys/ + ``` + +### Cache (Valkey) Issues + +**Symptom:** Session errors, slow responses + +**Solutions:** + +1. **Check Valkey status** + ```bash + ./botserver-stack/bin/cache/valkey-cli ping + # Expected: PONG + ``` + +2. **Memory issues** + ```bash + ./botserver-stack/bin/cache/valkey-cli info memory + + # Flush cache if needed + ./botserver-stack/bin/cache/valkey-cli FLUSHALL + ``` + +3. **Connection refused** + ```bash + # Check if running + pgrep valkey-server + + # Restart + ./botserver restart cache + ``` + +### Directory (Zitadel) Issues + +**Symptom:** Login fails, authentication errors + +**Solutions:** + +1. **Check Zitadel logs** + ```bash + tail -100 botserver-stack/logs/zitadel.log + ``` + +2. **Database connection** + ```bash + # Zitadel uses PostgreSQL + psql $DATABASE_URL -c "SELECT 1;" + ``` + +3. **Certificate issues** + ```bash + # Regenerate certificates + ./botserver regenerate-certs + ``` + +--- + +## Connection Issues + +### Cannot Connect to Database + +**Error:** `connection refused` or `authentication failed` + +**Solutions:** + +1. **Verify DATABASE_URL** + ```bash + echo $DATABASE_URL + # Should be: postgres://user:pass@localhost:5432/dbname + ``` + +2. **Check PostgreSQL is running** + ```bash + pgrep postgres + ./botserver status tables + ``` + +3. **Test connection** + ```bash + psql $DATABASE_URL -c "SELECT 1;" + ``` + +4. **Check pg_hba.conf** + ```bash + cat botserver-stack/conf/tables/pg_hba.conf + # Ensure local connections are allowed + ``` + +### SSL/TLS Certificate Errors + +**Error:** `certificate verify failed` or `SSL handshake failed` + +**Solutions:** + +1. **Regenerate certificates** + ```bash + ./botserver regenerate-certs + ``` + +2. **Check certificate validity** + ```bash + openssl x509 -in botserver-stack/conf/system/certificates/api/server.crt -noout -dates + ``` + +3. **Skip verification (development only)** + ```bash + curl -k https://localhost:8081/health + ``` + +### Network Timeouts + +**Error:** Requests timeout after waiting + +**Solutions:** + +1. **Check DNS resolution** + ```bash + nslookup api.botserver.local + ``` + +2. **Verify firewall rules** + ```bash + sudo ufw status + sudo iptables -L + ``` + +3. **Check service is listening** + ```bash + ss -tlnp | grep 8080 + ``` + +--- + +## Performance Issues + +### Slow Response Times + +**Solutions:** + +1. **Check system resources** + ```bash + top -b -n 1 | head -20 + iostat -x 1 3 + ``` + +2. **Database performance** + ```bash + psql $DATABASE_URL -c "SELECT * FROM pg_stat_activity;" + + # Vacuum database + psql $DATABASE_URL -c "VACUUM ANALYZE;" + ``` + +3. **LLM performance** + ```bash + # Reduce context size + # config.csv: llm-server-ctx-size,2048 + + # Use GPU layers + # config.csv: llm-server-gpu-layers,35 + ``` + +4. **Enable caching** + ```bash + # Verify cache is working + ./botserver-stack/bin/cache/valkey-cli info stats + ``` + +### High Memory Usage + +**Solutions:** + +1. **Identify memory hogs** + ```bash + ps aux --sort=-%mem | head -10 + ``` + +2. **Reduce LLM memory** + ```bash + # Use quantized model (Q3_K_M instead of F16) + # Reduce context: llm-server-ctx-size,1024 + # Reduce batch: llm-server-batch-size,256 + ``` + +3. **Limit PostgreSQL memory** + ```bash + # Edit postgresql.conf + shared_buffers = 256MB + work_mem = 64MB + ``` + +### High Disk Usage + +**Solutions:** + +1. **Find large files** + ```bash + du -sh botserver-stack/* + du -sh botserver-stack/data/* + ``` + +2. **Clean logs** + ```bash + truncate -s 0 botserver-stack/logs/*.log + ``` + +3. **Clean old installers** + ```bash + # Keep only latest versions + ls -la botserver-installers/ + rm botserver-installers/old-* + ``` + +4. **Prune drive storage** + ```bash + mc rm --recursive --older-than 30d local/bucket/ + ``` + +--- + +## Update Issues + +### Component Update Failed + +**Symptom:** Update command fails or service won't start after update + +**Solutions:** + +1. **Clear cache and retry** + ```bash + rm botserver-installers/component-name* + ./botserver update component-name + ``` + +2. **Checksum mismatch** + ```bash + # Verify checksum + sha256sum botserver-installers/file.zip + + # Compare with 3rdparty.toml + grep sha256 3rdparty.toml | grep component + + # Update checksum if release changed + ``` + +3. **Rollback to previous version** + ```bash + # If old version cached + ls botserver-installers/ + + # Restore old binary + cp botserver-installers/old-version.zip /tmp/ + unzip /tmp/old-version.zip -d botserver-stack/bin/component/ + ``` + +### Database Migration Failed + +**Solutions:** + +1. **Check migration status** + ```bash + ./botserver migrate --status + ``` + +2. **Run migrations manually** + ```bash + ./botserver migrate + ``` + +3. **Rollback migration** + ```bash + ./botserver migrate --rollback + ``` + +4. **Reset from backup** + ```bash + pg_restore -c -d $DATABASE_URL backup.dump + ``` + +--- + +## Common Error Messages + +| Error | Cause | Solution | +|-------|-------|----------| +| `connection refused` | Service not running | Start the service | +| `permission denied` | File permissions | `chmod +x` on binary | +| `address already in use` | Port conflict | Kill conflicting process | +| `out of memory` | Insufficient RAM | Reduce model/context size | +| `no such file or directory` | Missing binary/config | Re-run bootstrap | +| `certificate verify failed` | SSL issues | Regenerate certificates | +| `authentication failed` | Wrong credentials | Check Vault secrets | +| `disk quota exceeded` | Disk full | Clean logs/old files | +| `too many open files` | ulimit too low | `ulimit -n 65536` | +| `connection timed out` | Network/firewall | Check firewall rules | + +--- + +## Getting Help + +### Collect Diagnostics + +```bash +# Generate diagnostic report +./botserver diagnose > diagnostics-$(date +%Y%m%d).txt + +# Include in bug reports: +# - BotServer version +# - OS and architecture +# - Error messages +# - Relevant logs +``` + +### Debug Logging + +```bash +# Enable verbose logging +RUST_LOG=debug ./botserver + +# Trace level (very verbose) +RUST_LOG=trace ./botserver +``` + +### Community Support + +- GitHub Issues: [github.com/GeneralBots/BotServer/issues](https://github.com/GeneralBots/BotServer/issues) +- Documentation: [docs.generalbots.ai](https://docs.generalbots.ai) + +--- + +## See Also + +- [Updating Components](./updating-components.md) - Safe update procedures +- [Backup and Recovery](./backup-recovery.md) - Data protection +- [Security Auditing](./security-auditing.md) - Security checks \ No newline at end of file diff --git a/src/19-maintenance/updating-components.md b/src/19-maintenance/updating-components.md new file mode 100644 index 00000000..167d8714 --- /dev/null +++ b/src/19-maintenance/updating-components.md @@ -0,0 +1,552 @@ +# Updating Components + +BotServer's stack components are regularly updated by their respective maintainers. This guide explains how to check for updates, apply them safely, and verify everything works correctly. + +## Update Philosophy + +BotServer uses a **conservative update strategy**: + +1. **Pinned Versions** - Each component has a tested version in `3rdparty.toml` +2. **Checksum Verification** - Downloads are verified with SHA256 hashes +3. **Cached Downloads** - Updates are cached in `botserver-installers/` for offline use +4. **Rollback Ready** - Previous binaries can be restored from cache + +## Checking for Updates + +### View Current Versions + +Check installed versions: + +```bash +./botserver version --all +``` + +Example output: +``` +BotServer Stack Versions: + vault: 1.15.4 + tables: 17.2.0 (PostgreSQL) + directory: 2.70.4 (Zitadel) + drive: latest (MinIO) + cache: 8.0.2 (Valkey) + llm: b7345 (llama.cpp) + email: 0.10.7 (Stalwart) + proxy: 2.9.1 (Caddy) + dns: 1.11.1 (CoreDNS) + alm: 10.0.2 (Forgejo) + meeting: 2.8.2 (LiveKit) +``` + +### Check Upstream Releases + +| Component | Release Page | +|-----------|--------------| +| llama.cpp | [github.com/ggml-org/llama.cpp/releases](https://github.com/ggml-org/llama.cpp/releases) | +| PostgreSQL | [postgresql.org/download](https://www.postgresql.org/download/) | +| MinIO | [github.com/minio/minio/releases](https://github.com/minio/minio/releases) | +| Valkey | [github.com/valkey-io/valkey/releases](https://github.com/valkey-io/valkey/releases) | +| Zitadel | [github.com/zitadel/zitadel/releases](https://github.com/zitadel/zitadel/releases) | +| Vault | [releases.hashicorp.com/vault](https://releases.hashicorp.com/vault/) | +| Stalwart | [github.com/stalwartlabs/mail-server/releases](https://github.com/stalwartlabs/mail-server/releases) | +| Caddy | [github.com/caddyserver/caddy/releases](https://github.com/caddyserver/caddy/releases) | +| CoreDNS | [github.com/coredns/coredns/releases](https://github.com/coredns/coredns/releases) | +| Forgejo | [codeberg.org/forgejo/forgejo/releases](https://codeberg.org/forgejo/forgejo/releases) | +| LiveKit | [github.com/livekit/livekit/releases](https://github.com/livekit/livekit/releases) | + +--- + +## Updating the Configuration + +Component URLs and checksums are defined in `3rdparty.toml`. To update a component: + +### 1. Edit `3rdparty.toml` + +```toml +[components.llm] +name = "Llama.cpp Server" +url = "https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-ubuntu-x64.zip" +filename = "llama-b7345-bin-ubuntu-x64.zip" +sha256 = "91b066ecc53c20693a2d39703c12bc7a69c804b0768fee064d47df702f616e52" +``` + +### 2. Get the New Checksum + +Most releases publish SHA256 checksums. If not, calculate it: + +```bash +# Download and calculate checksum +curl -L -o new-release.zip "https://github.com/.../new-release.zip" +sha256sum new-release.zip +``` + +### 3. Update Both Files + +Update both configuration files to stay in sync: + +- `3rdparty.toml` - Main component registry +- `config/llm_releases.json` - LLM-specific builds and checksums + +--- + +## Component Update Procedures + +### Updating llama.cpp (LLM Server) + +The LLM server powers local AI inference. Updates often include performance improvements and new model support. + +**Step 1: Check the latest release** + +Visit [github.com/ggml-org/llama.cpp/releases](https://github.com/ggml-org/llama.cpp/releases) + +**Step 2: Update `3rdparty.toml`** + +```toml +[components.llm] +name = "Llama.cpp Server" +url = "https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-ubuntu-x64.zip" +filename = "llama-b7345-bin-ubuntu-x64.zip" +sha256 = "91b066ecc53c20693a2d39703c12bc7a69c804b0768fee064d47df702f616e52" +``` + +**Step 3: Update `config/llm_releases.json`** + +This file contains platform-specific builds: + +```json +{ + "llama_cpp": { + "version": "b7345", + "base_url": "https://github.com/ggml-org/llama.cpp/releases/download", + "checksums": { + "llama-b7345-bin-ubuntu-x64.zip": "sha256:91b066ecc53c20693a2d39703c12bc7a69c804b0768fee064d47df702f616e52", + "llama-b7345-bin-macos-arm64.zip": "sha256:72ae9b4a4605aa1223d7aabaa5326c66c268b12d13a449fcc06f61099cd02a52" + } + } +} +``` + +**Step 4: Update installer.rs version constant** + +```rust +const LLAMA_CPP_VERSION: &str = "b7345"; +``` + +**Step 5: Apply the update** + +```bash +# Stop LLM service +pkill llama-server + +# Remove old binary +rm -rf botserver-stack/bin/llm/* + +# Re-run bootstrap (downloads new version) +./botserver bootstrap + +# Or manually trigger download +./botserver update llm +``` + +**Available llama.cpp Builds (b7345)** + +| Platform | Architecture | Variant | Filename | +|----------|-------------|---------|----------| +| Linux | x64 | CPU | `llama-b7345-bin-ubuntu-x64.zip` | +| Linux | x64 | Vulkan | `llama-b7345-bin-ubuntu-vulkan-x64.zip` | +| Linux | s390x | CPU | `llama-b7345-bin-ubuntu-s390x.zip` | +| macOS | ARM64 | Metal | `llama-b7345-bin-macos-arm64.zip` | +| macOS | x64 | CPU | `llama-b7345-bin-macos-x64.zip` | +| Windows | x64 | CPU | `llama-b7345-bin-win-cpu-x64.zip` | +| Windows | x64 | CUDA 12.4 | `llama-b7345-bin-win-cuda-12.4-x64.zip` | +| Windows | x64 | CUDA 13.1 | `llama-b7345-bin-win-cuda-13.1-x64.zip` | +| Windows | x64 | Vulkan | `llama-b7345-bin-win-vulkan-x64.zip` | +| Windows | ARM64 | CPU | `llama-b7345-bin-win-cpu-arm64.zip` | + +> **Note:** Linux releases are transitioning from `.zip` to `.tar.gz` format. + +--- + +### Updating PostgreSQL (Tables) + +**Warning:** Database updates require careful planning. Always backup first! + +```bash +# Backup database +pg_dump $DATABASE_URL > backup-$(date +%Y%m%d).sql + +# Update 3rdparty.toml +[components.tables] +url = "https://github.com/theseus-rs/postgresql-binaries/releases/download/17.2.0/postgresql-17.2.0-x86_64-unknown-linux-gnu.tar.gz" +filename = "postgresql-17.2.0-x86_64-unknown-linux-gnu.tar.gz" + +# Stop services +./botserver stop + +# Apply update +./botserver update tables + +# Start services +./botserver start + +# Verify +psql $DATABASE_URL -c "SELECT version();" +``` + +--- + +### Updating MinIO (Drive) + +MinIO updates are generally safe and backward-compatible. + +```bash +# Update 3rdparty.toml +[components.drive] +url = "https://dl.min.io/server/minio/release/linux-amd64/minio" +filename = "minio" + +# Apply update +./botserver update drive + +# Verify +curl http://localhost:9000/minio/health/live +``` + +--- + +### Updating Valkey (Cache) + +Valkey requires compilation from source. + +```bash +# Update 3rdparty.toml +[components.cache] +url = "https://github.com/valkey-io/valkey/archive/refs/tags/8.0.2.tar.gz" +filename = "valkey-8.0.2.tar.gz" + +# Stop cache +./botserver stop cache + +# Remove old build +rm -rf botserver-stack/bin/cache/* + +# Rebuild +./botserver update cache + +# Verify +./botserver-stack/bin/cache/valkey-cli ping +``` + +--- + +### Updating Zitadel (Directory) + +**Warning:** Directory service updates may require database migrations. + +```bash +# Backup Zitadel database +pg_dump -d zitadel > zitadel-backup-$(date +%Y%m%d).sql + +# Update 3rdparty.toml +[components.directory] +url = "https://github.com/zitadel/zitadel/releases/download/v2.70.4/zitadel-linux-amd64.tar.gz" +filename = "zitadel-linux-amd64.tar.gz" + +# Stop directory +./botserver stop directory + +# Apply update +./botserver update directory + +# Run migrations (if needed) +./botserver-stack/bin/directory/zitadel setup + +# Start +./botserver start directory +``` + +--- + +### Updating Vault (Secrets) + +**Critical:** Vault updates require unsealing after restart. + +```bash +# Update 3rdparty.toml +[components.vault] +url = "https://releases.hashicorp.com/vault/1.15.4/vault_1.15.4_linux_amd64.zip" +filename = "vault_1.15.4_linux_amd64.zip" + +# Stop Vault +./botserver stop vault + +# Apply update +./botserver update vault + +# Start and unseal +./botserver start vault +./botserver unseal +``` + +--- + +## Platform-Specific Builds + +### Automatic Detection + +BotServer automatically detects your platform and downloads the appropriate build: + +1. **Operating System** - Linux, macOS, Windows +2. **Architecture** - x64, ARM64, s390x +3. **GPU Support** - CUDA, Vulkan, Metal, ROCm + +### Manual Override + +Force a specific build variant: + +```toml +# In 3rdparty.toml - use Vulkan build instead of CPU +[components.llm] +url = "https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-ubuntu-vulkan-x64.zip" +``` + +### GPU Detection + +The installer checks for GPU support: + +```rust +// Linux CUDA detection +if Path::new("/usr/local/cuda").exists() || env::var("CUDA_HOME").is_ok() { + // Use CUDA build +} + +// Vulkan detection +if Path::new("/usr/share/vulkan").exists() || env::var("VULKAN_SDK").is_ok() { + // Use Vulkan build +} +``` + +--- + +## Offline Updates + +### Pre-download for Air-Gapped Systems + +1. Download releases on a connected machine: + +```bash +# Download all components +mkdir offline-updates +cd offline-updates + +# LLM +curl -LO https://github.com/ggml-org/llama.cpp/releases/download/b7345/llama-b7345-bin-ubuntu-x64.zip + +# Database +curl -LO https://github.com/theseus-rs/postgresql-binaries/releases/download/17.2.0/postgresql-17.2.0-x86_64-unknown-linux-gnu.tar.gz + +# ... other components +``` + +2. Transfer to air-gapped system +3. Copy to cache directory: + +```bash +cp offline-updates/* /path/to/botserver-installers/ +``` + +4. Run bootstrap (uses cached files): + +```bash +./botserver bootstrap +``` + +--- + +## Verifying Updates + +### Run Tests + +```bash +# Run test suite +cargo test + +# Integration tests +./botserver test +``` + +### Health Checks + +```bash +# Check all services +./botserver status + +# Individual service checks +curl -k https://localhost:8081/health # LLM +curl -k https://localhost:8082/health # Embedding +curl http://localhost:9000/minio/health/live # Drive +``` + +### Security Audit + +After updating dependencies: + +```bash +# Rust dependencies +cargo audit + +# Check for known vulnerabilities +cargo audit --deny warnings +``` + +--- + +## Rollback Procedure + +If an update causes issues: + +### Quick Rollback + +```bash +# Stop services +./botserver stop + +# Restore from cache (previous version must exist) +cp botserver-installers/llama-b4547-bin-ubuntu-x64.zip /tmp/ +unzip /tmp/llama-b4547-bin-ubuntu-x64.zip -d botserver-stack/bin/llm/ + +# Restart +./botserver start +``` + +### Full Rollback + +```bash +# Restore database from backup +psql $DATABASE_URL < backup-20241210.sql + +# Restore old binaries +rm -rf botserver-stack/bin/ +tar -xzf botserver-stack-backup.tar.gz + +# Restart +./botserver start +``` + +--- + +## Update Schedule Recommendations + +| Component | Update Frequency | Risk Level | +|-----------|-----------------|------------| +| llama.cpp | Weekly/Monthly | Low | +| MinIO | Monthly | Low | +| Valkey | Quarterly | Low | +| Caddy | Monthly | Low | +| CoreDNS | Quarterly | Low | +| PostgreSQL | Quarterly | Medium | +| Zitadel | Quarterly | Medium | +| Vault | Quarterly | High | +| Stalwart | Monthly | Medium | + +### Security Updates + +Apply security patches immediately for: +- Vault (secrets management) +- PostgreSQL (database) +- Zitadel (authentication) + +--- + +## Automating Updates + +### Update Script + +Create `update-components.sh`: + +```bash +#!/bin/bash +set -e + +echo "Backing up current state..." +./botserver backup + +echo "Stopping services..." +./botserver stop + +echo "Updating components..." +for component in llm drive cache; do + echo "Updating $component..." + ./botserver update $component +done + +echo "Starting services..." +./botserver start + +echo "Running health checks..." +./botserver status + +echo "Update complete!" +``` + +### Scheduled Updates + +Use cron for automated updates (use with caution): + +```bash +# Weekly LLM updates (low risk) +0 3 * * 0 /path/to/botserver update llm + +# Monthly full updates +0 3 1 * * /path/to/update-components.sh +``` + +--- + +## Troubleshooting Updates + +### Download Failures + +```bash +# Clear cache and retry +rm botserver-installers/component-name* +./botserver update component-name +``` + +### Checksum Mismatch + +```bash +# Verify checksum manually +sha256sum botserver-installers/llama-b7345-bin-ubuntu-x64.zip +# Compare with 3rdparty.toml +``` + +### Service Won't Start + +```bash +# Check logs +tail -100 botserver-stack/logs/llm.log + +# Check permissions +ls -la botserver-stack/bin/llm/ + +# Make executable +chmod +x botserver-stack/bin/llm/llama-server +``` + +### Database Migration Errors + +```bash +# Run migrations manually +./botserver migrate + +# Or reset (WARNING: data loss) +./botserver reset tables +``` + +--- + +## See Also + +- [Component Reference](./component-reference.md) - Detailed component documentation +- [Security Auditing](./security-auditing.md) - Vulnerability scanning +- [Backup and Recovery](./backup-recovery.md) - Data protection \ No newline at end of file diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 0a2a5063..432e6307 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -379,6 +379,12 @@ - [Multimodal](./18-appendix-external-services/multimodal.md) - [Console (XtreeUI)](./18-appendix-external-services/console.md) +- [Appendix C: Maintenance](./19-maintenance/README.md) + - [Updating Components](./19-maintenance/updating-components.md) + - [Component Reference](./19-maintenance/component-reference.md) + - [Security Auditing](./19-maintenance/security-auditing.md) + - [Backup and Recovery](./19-maintenance/backup-recovery.md) + - [Troubleshooting](./19-maintenance/troubleshooting.md) - [Appendix D: Documentation Style](./16-appendix-docs-style/conversation-examples.md) - [SVG and Conversation Standards](./16-appendix-docs-style/svg.md)