From 27c19def7d4b86bbc74ff28b03117135e265908e Mon Sep 17 00:00:00 2001 From: "Rodrigo Rodriguez (Pragmatismo)" Date: Sat, 18 Apr 2026 10:29:58 -0300 Subject: [PATCH] docs: Update AGENTS.md and PROD.md with comprehensive operations guide AGENTS.md (1,871 lines): - Add complete system architecture diagram with WebSocket flow - Document all MessageTypes (0-6) including TOOL_EXEC (type 6) - Detail start.bas, tables.bas, {tool}.bas execution flow - Add complete BASIC keywords reference: * Common: TALK, HEAR, USE KB, USE WEBSITE, ADD SUGGESTION * Database: GET, SAVE, FIND, FIRST/LAST/COUNT * Files: CREATE/READ/WRITE FILE, UPLOAD * HTTP: GET/POST HTTP, WEBHOOK * Scheduling: CREATE_TASK, WAIT, ON events * Memory: SET BOT MEMORY, REMEMBER, CONTEXT * Multi-bot: ADD BOT, DELEGATE TO, BROADCAST * Control flow: IF/ELSE, FOR EACH, SWITCH * Built-in vars: TODAY, NOW, USER, SESSION, BOT - Include AutoTask keywords from PROMPT.md files - Production operations guide from historical versions PROD.md (851 lines): - Add complete container architecture table (16 containers) * Service names, technologies, binary/log/data paths * Network access internal/external URLs - Expand troubleshooting section: * Container without IPv4: diagnose + fix * CI/ALM permission errors: chmod, ownership * MinIO mc operations: setup + common commands * Forgejo DB: SQL queries for CI runs * Zitadel API v2: curl examples with Host header * Common errors table with quick fixes Both files now provide day-to-day operations without LLM assistance. Refs: Consolidated from git history and PROMPT.md files --- AGENTS.md | 1873 +++++++++++++++++++++++++++++++++++++++++++++++++++-- PROD.md | 1119 +++++++++++++++++--------------- 2 files changed, 2417 insertions(+), 575 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 1682a76..ed199eb 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,132 +1,1871 @@ # General Bots AI Agent Guidelines +- stop saving .png on root! Use /tmp. never allow new files on root. +- never push to alm without asking first - pbecause it is production! +- **❌ NEVER deploy to production manually — ALWAYS use CI/CD pipeline** +- **❌ NEVER include sensitive data (IPs, tokens, passwords, keys) in AGENTS.md or any documentation** +- **❌ NEVER use `scp`, direct SSH binary copy, or manual deployment to system container** +- **✅ ALWAYS push to ALM → CI builds on alm-ci → CI deploys to system container automatically** +8080 is server 3000 is client ui +if you are in trouble with some tool, please go to the ofiical website to get proper install or instructions +To test web is http://localhost:3000 (botui!) +Use apenas a lingua culta ao falar . +test login here http://localhost:3000/suite/auth/login.html +> **⚠️ CRITICAL SECURITY WARNING** +I AM IN DEV ENV, but sometimes, pasting from PROD, do not treat my env as prod! Just fix, to me and push to CI. So I can test in PROD, for a while. +>Use Playwrigth MCP to start localhost:3000/ now. +> **NEVER CREATE FILES WITH SECRETS IN THE REPOSITORY ROOT** +> - ❌ **NEVER** write internal IPs to logs or output +> - When debugging network issues, mask IPs (e.g., "10.x.x.x" instead of "10.16.164.222") +> - Use hostnames instead of IPs in configs and documentation +See botserver/src/drive/local_file_monitor.rs to see how to load from /opt/gbo/data the list of development bots. +- ❌ **NEVER** use `cargo clean` - causes 30min rebuilds, use `./reset.sh` for database issues - -NEVER INCLUDE HERE CREDENTIALS OR COMPANY INFORMATION, THIS IS COMPANY AGNOSTIC. -Use apenas a língua culta ao falar. Never save files to root — use `/tmp` for temp files. Never push to ALM without asking first (it is production). If a tool fails to install, check the official website for instructions. Local file support (`/opt/gbo/data`) has been removed; bots are loaded only from Drive (MinIO/S3). +> +> Secret files MUST be placed in `/tmp/` only: +> - ✅ `/tmp/vault-token-gb` - Vault root token +> - ✅ `/tmp/vault-unseal-key-gb` - Vault unseal key +> - ❌ `vault-unseal-keys` - FORBIDDEN (tracked by git) +> - ❌ `start-and-unseal.sh` - FORBIDDEN (contains secrets) +> +> **Why `/tmp/`?** +> - Cleared on reboot (ephemeral) +> - Not tracked by git +> - Standard Unix security practice +> - Prevents accidental commits --- -## Critical Production Rules +## 📁 WORKSPACE STRUCTURE -Always manage services via `systemctl` inside the `system` Incus container. Never run `/opt/gbo/bin/botserver` or `/opt/gbo/bin/botui` directly — they skip the `.env` file, which means Vault credentials fail to load and services break. The correct commands are `sudo incus exec system -- systemctl start|stop|restart|status botserver` and the same for `ui`. Systemctl handles env loading, auto-restart, and process lifecycle. +| Crate | Purpose | Port | Tech Stack | +|-------|---------|------|------------| +| **botserver** | Main API server, business logic | 8080 | Axum, Diesel, Rhai BASIC | +| **botui** | Web UI server (dev) + proxy | 3000 | Axum, HTML/HTMX/CSS | +| **botapp** | Desktop app wrapper | - | Tauri 2 | +| **botlib** | Shared library | - | Core types, errors | +| **botbook** | Documentation | - | mdBook | +| **bottest** | Integration tests | - | tokio-test | +| **botdevice** | IoT/Device support | - | Rust | +| **botplugin** | Browser extension | - | JS | -In development you may use `cargo run` or `./target/debug/botserver` with `botserver/.env`. In production, always use `systemctl start botserver` with `/opt/gbo/bin/.env`. +### Key Paths +- **Binary:** `target/debug/botserver` +- **Run from:** `botserver/` directory +- **Env file:** `botserver/.env` +- **UI Files:** `botui/ui/suite/` --- -## Workspace Structure +## 🏗️ System Architecture Overview -The workspace has eight crates. `botserver` is the main API server (port 8080) using Axum, Diesel, and Rhai BASIC. `botui` is the web UI server and proxy (port 3000) using Axum, HTML/HTMX/CSS. `botapp` is a Tauri 2 desktop wrapper. `botlib` holds shared types and errors. `botbook` is mdBook documentation. `bottest` holds integration tests. `botdevice` handles IoT/device support. `botplugin` is a JS browser extension. +### Chat Flow Architecture -Key paths: binary at `target/debug/botserver`, always run from the `botserver/` directory, env file at `botserver/.env`, UI files under `botui/ui/suite/`, bot data exclusively in Drive (MinIO/S3) under `/{botname}.gbai/` buckets. Test at `http://localhost:3000`; login at `http://localhost:3000/suite/auth/login.html`. +``` +User Message (WebSocket) +│ +▼ +┌─────────────────────────────────┐ +│ 1. WebSocket Connection │ botserver/src/websocket.rs +│ - Session established │ UserSession created +│ - Redis connection │ session_id generated +└──────────────┬──────────────────┘ +│ +▼ +┌─────────────────────────────────┐ +│ 2. start.bas Execution │ /opt/gbo/data/{bot}.gbai/... +│ - Runs ONCE per session │ {bot}.gbdialog/start.bas +│ - ADD_SUGGESTION calls │ Adds button suggestions +│ - Sets Redis flag │ prevents re-run +└──────────────┬──────────────────┘ +│ +▼ +┌─────────────────────────────────┐ +│ 3. Message Processing │ stream_response() +│ - IF message_type == 6 │ TOOL_EXEC (bypass LLM) +│ - ELSE: KB injection │ USE_KB context +│ - LLM processing │ generate_response() +└──────────────┬──────────────────┘ +│ +▼ +┌─────────────────────────────────┐ +│ 4. Tool Execution │ TOOL_EXEC (type 6) +│ - Direct .ast execution │ No LLM, no KB +│ - Rhai engine │ ScriptService::run() +│ - Immediate response │ Result in chat +└──────────────┬──────────────────┘ +│ +▼ +┌─────────────────────────────────┐ +│ 5. LLM Response (if not tool) │ Groq/OpenAI/etc +│ - Prompt with context │ System + KB + History +│ - Streaming response │ WebSocket chunks +│ - Tool suggestions │ LLM suggests tools +└──────────────┬──────────────────┘ +│ +▼ +┌─────────────────────────────────┐ +│ 6. Frontend Display │ botui HTMX/WebSocket +│ - Message appended │ #chat-messages +│ - Suggestion buttons │ From Redis suggestions:{bot}:{session} +│ - Tool buttons active │ MessageType 6 triggers +└─────────────────────────────────┘ +``` -Bot files in Drive follow this structure: `{botname}.gbai/{botname}.gbdialog/` contains `*.bas` scripts, `config.csv`, and the `.gbkb/` knowledge base folder. There is no local file monitoring — botserver compiles `.bas` to `.ast` in memory from Drive only. +### Message Types Reference + +| ID | Name | Purpose | LLM Used? | +|----|------|---------|-----------| +| 0 | EXTERNAL | External message | No | +| 1 | USER | User message | Yes | +| 2 | BOT_RESPONSE | Bot response | No | +| 3 | CONTINUE | Continue processing | No | +| 4 | SUGGESTION | Suggestion button | Yes | +| 5 | CONTEXT_CHANGE | Context change | No | +| 6 | **TOOL_EXEC** | **Direct tool execution** | **No - Bypasses LLM** | + +**TOOL_EXEC (Type 6)**: When frontend sends `message_type: 6`, backend executes the tool `.ast` file directly via Rhai engine. NO KB injection, NO LLM call. Result appears immediately in chat. --- -## Absolute Prohibitions +## 📝 Bot Scripts Architecture -Never search the `/target` folder. Never build in release mode or use `--release`. Never run `cargo build` — use `cargo check` for verification. Never run `cargo clean` (causes 30-minute rebuilds); use `./reset.sh` for DB issues. Never deploy manually via `scp`, SSH binary copy, or any method other than the CI/CD pipeline (push → ALM → alm-ci builds → deploys to system container). Never run the binary directly in production — use `systemctl` or `./restart.sh`. +### start.bas - Session Entry Point -Never use `panic!()`, `todo!()`, `unimplemented!()`, `unwrap()`, or `expect()` in Rust code. Never use `Command::new()` directly — use `SafeCommand`. Never return raw error strings to HTTP clients — use `ErrorSanitizer`. Never use `#[allow()]` or lint exceptions in `Cargo.toml` — fix the code. Never use `_` prefix for unused variables — delete or use them. Never leave unused imports, dead code, or commented-out code. Never use CDN links — all assets must be local. Never create `.md` docs without checking `botbook/` first. Never hardcode credentials — use `generate_random_string()` or env vars. Never include sensitive data (IPs, tokens, keys) in docs or code; mask IPs in logs as `10.x.x.x`. Never create files with secrets anywhere except `/tmp/`. +**Execution:** +- Runs on WebSocket connect +- Runs again on first user message (blocking, once per session) +- Sets Redis key: `session:{session_id}:initialized` +- Subsequent messages skip start.bas + +**Purpose:** +- Load suggestion buttons via `ADD_SUGGESTION "text"` +- Initialize bot memory +- Set up context + +**Example:** +```basic +' start.bas +ADD SUGGESTION "Check inventory" +ADD SUGGESTION "Create report" +ADD SUGGESTION "Send email" + +TALK "Hello! I'm your assistant. How can I help?" +``` + +### tables.bas - Database Schema + +**SPECIAL FILE - DO NOT CALL WITH CALL** +- Parsed automatically at compile time +- Defines tables for `sync_bot_tables()` +- Creates/updates database schema + +**Example:** +```basic +' tables.bas +BEGIN TABLE customers + id UUID PRIMARY KEY + name VARCHAR(255) + email VARCHAR(255) + created_at TIMESTAMP +END TABLE + +BEGIN TABLE orders + id UUID PRIMARY KEY + customer_id UUID REFERENCES customers + total DECIMAL(10,2) + status VARCHAR(50) +END TABLE +``` + +### {tool}.bas - Tool Scripts + +**Location:** `/opt/gbo/data/{bot}.gbai/{bot}.gbdialog/{tool}.bas` +**Compiled to:** `{tool}.ast` (in memory or `/opt/gbo/work/`) +**Execution:** Via `CALL "tool"` or TOOL_EXEC (type 6) + +**Example:** +```basic +' detecta.bas - Inventory checker + +items = GET FROM inventory WHERE quantity < 10 + +IF COUNT(items) = 0 THEN + TALK "All items well stocked!" +ELSE + response = "Low stock items:\n" + FOR EACH item IN items + response = response + "- " + item.name + ": " + item.quantity + "\n" + NEXT + TALK response +END IF +``` + +### CALL Keyword + +```basic +' Call in-memory procedure or .bas script +CALL "script_name" +CALL "procedure_name" + +' If not in memory, looks for {name}.bas in bot's gbdialog folder +``` + +### DETECT Keyword + +```basic +' Analyze table for anomalies +' Requires table defined in tables.bas +result = DETECT "folha_salarios" + +' Calls BotModels API at /api/anomaly/detect +``` --- -## Build Pattern — Fix Fast Loop +## 💬 Common BASIC Keywords Reference -When checking botserver, run `cargo check -p botserver > /tmp/check.log 2>&1 &`, capture the PID, then loop watching line count and kill the process once it exceeds 20 lines. After killing, check for errors with `strings /tmp/check.log | grep "^error" | head -20`. Fix errors immediately, then repeat. Never use `--all-features` (pulls docs/slides dependencies). This saves 10+ minutes per error cycle since full compilation takes 2–3 minutes. The key rule: kill at 20 lines, fix immediately, loop until clean. +### TALK - Bot Response -If the process is killed by OOM, run `pkill -9 cargo; pkill -9 rustc; pkill -9 botserver` then retry with `CARGO_BUILD_JOBS=1 cargo check -p botserver 2>&1 | tail -200`. +```basic +TALK "Hello, user!" +TALK "Result: " + result + +' Multi-line with \n +TALK "Line 1\nLine 2\nLine 3" +``` + +### HEAR - Listen for Input + +```basic +HEAR "What's your name?" AS name +HEAR "Enter amount:" AS amount + +' Used in voice/chat triggered tools +HEAR "check inventory" AS request +``` + +### USE KB - Knowledge Base Context + +```basic +' Inject KB content into LLM context +USE KB "manual" +USE KB "faq" +USE KB "cartas" + +' Clear KB context +CLEAR KB + +' Multiple KBs +USE KB "kb1" +USE KB "kb2" +``` + +**Flow:** +``` +USE KB "manual" +↓ +Bot searches .gbkb/ folder for documents +↓ +Chunks text, creates embeddings +↓ +Queries Qdrant for relevant chunks +↓ +Injects into LLM prompt as context +↓ +User question answered with KB context +``` + +### USE WEBSITE - Web Scraping Context + +```basic +' Scrape website and inject into context +USE WEBSITE "https://example.com/docs" +USE WEBSITE "https://api.example.com/swagger" + +' Combined with USE KB +USE KB "manual" +USE WEBSITE "https://company.com/updates" +TALK "How can I help with our product?" +``` + +### ADD SUGGESTION - Suggestion Buttons + +```basic +' In start.bas - shown as quick reply buttons +ADD SUGGESTION "Check status" +ADD SUGGESTION "Create ticket" +ADD SUGGESTION "Contact support" + +' Deduplicated with Redis SADD +' Key: suggestions:{bot_id}:{session_id} +' Read with SMEMBERS +``` + +### Database Operations + +```basic +' GET - Query records +customers = GET FROM customers WHERE status = "active" +order = GET FROM orders WHERE id = "123" + +' SAVE - Insert/update +SAVE customer TO customers +SAVE order TO orders + +' FIND - Search +result = FIND "term" IN products + +' Array functions +first = FIRST(customers) +last = LAST(customers) +count = COUNT(customers) +``` + +### File Operations + +```basic +' Create file in .gbdrive/ +CREATE FILE "reports/sales.txt" WITH report_content + +' Read file +content = READ FILE "data/config.txt" + +' Write file +WRITE FILE "logs/activity.log" WITH log_entry + +' Upload to MinIO +UPLOAD data TO "exports/data.json" +``` + +### HTTP Operations + +```basic +' GET request +response = GET HTTP "https://api.example.com/data" + +' POST request +result = POST HTTP "https://api.example.com/webhook" WITH json_data + +' Webhook +WEBHOOK "https://callback.example.com" WITH payload +``` + +### Task & Scheduling + +```basic +' Create task +CREATE_TASK "Review report", "john", "2024-01-15", project_id + +' Wait +WAIT 5 ' seconds + +' Event handlers +ON EMAIL FROM "@company.com" DO CALL "process_email" +ON CHANGE customers DO CALL "notify_admin" +``` + +### Memory & Context + +```basic +' Bot-level memory (persists across sessions) +SET BOT MEMORY "company_name" = "Acme Corp" +name = GET BOT MEMORY "company_name" + +' Session-level memory +REMEMBER "user_preference" = "dark_mode" +pref = RECALL "user_preference" + +' Context variables +SET CONTEXT "current_order" = order_id +``` + +### Multi-Bot Operations + +```basic +' Add sub-bot +ADD BOT "sales" WITH TRIGGER "talk to sales" + +' Delegate +DELEGATE TO "sales" + +' Send message to another bot +SEND TO BOT "sales" MESSAGE "New lead available" + +' Broadcast +BROADCAST MESSAGE "System maintenance in 5 minutes" +``` + +### Control Flow + +```basic +' IF/THEN/ELSE +IF condition THEN + ' true branch +ELSE + ' false branch +END IF + +' FOR EACH loop +FOR EACH customer IN customers + SEND MAIL TO customer.email WITH subj, body + WAIT 1 +NEXT + +' SWITCH/CASE +SWITCH status + CASE "active" + TALK "Account active" + CASE "inactive" + TALK "Account inactive" + DEFAULT + TALK "Unknown status" +END SWITCH +``` + +### Built-in Variables + +| Variable | Description | Example | +|----------|-------------|---------| +| `TODAY` | Current date | `IF created_at == TODAY THEN` | +| `NOW` | Current datetime | `SET last_seen = NOW` | +| `USER` | Current user object | `USER.email`, `USER.id` | +| `SESSION` | Current session object | `SESSION.id` | +| `BOT` | Current bot object | `BOT.name`, `BOT.id` | --- -## Security Directives — Mandatory +## 🧭 LLM Navigation Guide -For error handling, never use `unwrap()`, `expect()`, `panic!()`, or `todo!()`. Use `value?`, `value.ok_or_else(|| Error::NotFound)?`, `value.unwrap_or_default()`, or `if let Some(v) = value { ... }`. - -For command execution, never use `Command::new("cmd").arg(user_input).output()`. Use `SafeCommand::new("allowed_command")?.arg("safe_arg")?.execute()` from `crate::security::command_guard`. - -For error responses, never return `Json(json!({ "error": e.to_string() }))`. Use `log_and_sanitize(&e, "context", None)` from `crate::security::error_sanitizer` and return `(StatusCode::INTERNAL_SERVER_ERROR, sanitized)`. - -For SQL, never use `format!("SELECT * FROM {}", user_table)`. Use `sanitize_identifier` and `validate_table_name` from `crate::security::sql_guard`. - -Rate limits: general 100 req/s, auth 10 req/s, API 50 req/s per token, WebSocket 10 msgs/s. Use the `governor` crate with per-IP and per-user tracking. All state-changing endpoints (POST/PUT/DELETE/PATCH) must require CSRF tokens via `tower_csrf` bound to the user session; Bearer Token endpoints are exempt. Every response must include these security headers: `Content-Security-Policy`, `Strict-Transport-Security`, `X-Frame-Options: DENY`, `X-Content-Type-Options: nosniff`, `Referrer-Policy: strict-origin-when-cross-origin`, and `Permissions-Policy: geolocation=(), microphone=(), camera=()`. - -For dependencies, app crates track `Cargo.lock`; lib crates do not. Critical deps use exact versions (`=1.0.1`); regular deps use caret (`1.0`). Run `cargo audit` weekly and update only via PR with testing. +### Reading This Workspace +/opt/gbo/data is a place also for bots. +**For LLMs analyzing this codebase:** +0. Bots are in /opt/gbo/data primary +1. Start with **[Component Dependency Graph](../README.md#-component-dependency-graph)** in README to understand relationships +2. Review **[Module Responsibility Matrix](../README.md#-module-responsibility-matrix)** for what each module does +3. Study **[Data Flow Patterns](../README.md#-data-flow-patterns)** to understand execution flow +4. Reference **[Common Architectural Patterns](../README.md#-common-architectural-patterns)** before making changes +5. Check **[Security Rules](#-security-directives---mandatory)** below - violations are blocking issues +6. Follow **[Code Patterns](#-mandatory-code-patterns)** below - consistency is mandatory --- -## Mandatory Code Patterns +## 🔄 Reset Process Notes -Use `Self` not the type name in `impl` blocks. Always derive both `PartialEq` and `Eq` together. Use inline format args: `format!("Hello {name}")` not `format!("Hello {}", name)`. Combine identical match arms: `A | B => do_thing()`. Maximum 450 lines per file — split proactively at 350 lines into `types.rs`, `handlers.rs`, `operations.rs`, `utils.rs`, and `mod.rs`, re-exporting all public items in `mod.rs`. +### reset.sh Behavior +- **Purpose**: Cleans and restarts the development environment +- **Timeouts**: The script can timeout during "Step 3/4: Waiting for BotServer to bootstrap" +- **Bootstrap Process**: Takes 3-5 minutes to install all components (Vault, PostgreSQL, Valkey, MinIO, Zitadel, LLM) + +### Common Issues +1. **Script Timeout**: reset.sh waits for "Bootstrap complete: admin user" message + - If Zitadel isn't ready within 60s, admin user creation fails + - Script continues waiting indefinitely + - **Solution**: Check botserver.log for "Bootstrap process completed!" message + +2. **Zitadel Not Ready**: "Bootstrap check failed (Zitadel may not be ready)" + - Directory service may need more than 60 seconds to start + - Admin user creation deferred + - Services still start successfully + +3. **Services Exit After Start**: + - botserver/botui may exit after initial startup + - Check logs for "dispatch failure" errors + - Check Vault certificate errors: "tls: failed to verify certificate: x509" + +### Manual Service Management +```bash +# If reset.sh times out, manually verify services: +ps aux | grep -E "(botserver|botui)" | grep -v grep +curl http://localhost:8080/health +tail -f botserver.log botui.log + +# Restart services manually: +./restart.sh +``` + +### Reset Verification +After reset completes, verify: +- ✅ PostgreSQL running (port 5432) +- ✅ Valkey cache running (port 6379) +- ✅ BotServer listening on port 8080 +- ✅ BotUI listening on port 3000 +- ✅ No errors in botserver.log --- -## Error Fixing Workflow +## 🔐 Security Directives - MANDATORY -Read the entire error list first. Group errors by file. For each file: view it, fix all errors, then write once. Only verify with `cargo check` after all fixes are applied — never compile after each individual fix. `cargo clippy --workspace` must pass with zero warnings. +### 1. Error Handling - NO PANICS IN PRODUCTION + +```rust +// ❌ FORBIDDEN +value.unwrap() +value.expect("message") +panic!("error") +todo!() +unimplemented!() + +// ✅ REQUIRED +value? +value.ok_or_else(|| Error::NotFound)? +value.unwrap_or_default() +value.unwrap_or_else(|e| { log::error!("{}", e); default }) +if let Some(v) = value { ... } +match value { Ok(v) => v, Err(e) => return Err(e.into()) } +``` + +### 2. Command Execution - USE SafeCommand + +```rust +// ❌ FORBIDDEN +Command::new("some_command").arg(user_input).output() + +// ✅ REQUIRED +use crate::security::command_guard::SafeCommand; +SafeCommand::new("allowed_command")? + .arg("safe_arg")? + .execute() +``` + +### 3. Error Responses - USE ErrorSanitizer + +```rust +// ❌ FORBIDDEN +Json(json!({ "error": e.to_string() })) +format!("Database error: {}", e) + +// ✅ REQUIRED +use crate::security::error_sanitizer::log_and_sanitize; +let sanitized = log_and_sanitize(&e, "context", None); +(StatusCode::INTERNAL_SERVER_ERROR, sanitized) +``` + +### 4. SQL - USE sql_guard + +```rust +// ❌ FORBIDDEN +format!("SELECT * FROM {}", user_table) + +// ✅ REQUIRED +use crate::security::sql_guard::{sanitize_identifier, validate_table_name}; +let safe_table = sanitize_identifier(&user_table); +validate_table_name(&safe_table)?; +``` + +### 5. Rate Limiting Strategy (IMP-07) + +- **Default Limits:** + - General: 100 req/s (global) + - Auth: 10 req/s (login endpoints) + - API: 50 req/s (per token) +- **Implementation:** + - MUST use `governor` crate + - MUST implement per-IP and per-User tracking + - WebSocket connections MUST have message rate limits (e.g., 10 msgs/s) + +### 6. CSRF Protection (IMP-08) + +- **Requirement:** ALL state-changing endpoints (POST, PUT, DELETE, PATCH) MUST require a CSRF token. +- **Implementation:** + - Use `tower_csrf` or similar middleware + - Token MUST be bound to user session + - Double-Submit Cookie pattern or Header-based token verification + - **Exemptions:** API endpoints using Bearer Token authentication (stateless) + +### 7. Security Headers (IMP-09) + +- **Mandatory Headers on ALL Responses:** + - `Content-Security-Policy`: "default-src 'self'; script-src 'self'; object-src 'none';" + - `Strict-Transport-Security`: "max-age=63072000; includeSubDomains; preload" + - `X-Frame-Options`: "DENY" or "SAMEORIGIN" + - `X-Content-Type-Options`: "nosniff" + - `Referrer-Policy`: "strict-origin-when-cross-origin" + - `Permissions-Policy`: "geolocation=(), microphone=(), camera=()" + +### 8. Dependency Management (IMP-10) + +- **Pinning:** + - Application crates (`botserver`, `botui`) MUST track `Cargo.lock` + - Library crates (`botlib`) MUST NOT track `Cargo.lock` +- **Versions:** + - Critical dependencies (crypto, security) MUST use exact versions (e.g., `=1.0.1`) + - Regular dependencies MAY use caret (e.g., `1.0`) +- **Auditing:** + - Run `cargo audit` weekly + - Update dependencies only via PR with testing --- -## Execution Modes +## ✅ Mandatory Code Patterns -In local standalone mode (no incus), botserver manages all services itself. Run `cargo run -- --install` once to download and extract PostgreSQL, Valkey, MinIO, and Vault binaries into `botserver-stack/bin/`, initialize data directories, and download the LLM model. Then `cargo run` starts everything and serves at `http://localhost:8080`. Use `./reset.sh` to wipe and restart the local environment. +### Use Self in Impl Blocks +```rust +impl MyStruct { + fn new() -> Self { Self { } } // ✅ Not MyStruct +} +``` -In container (Incus) production mode, services run in separate named containers. Start them all with `sudo incus start system tables vault directory drive cache llm vector_db`. Access the system container with `sudo incus exec system -- bash`. View botserver logs with `sudo incus exec system -- journalctl -u botserver -f`. The container layout is: `system` runs BotServer on 8080; `tables` runs PostgreSQL on 5432; `vault` runs Vault on 8200; `directory` runs Zitadel on 8080 internally (external port 9000 via iptables NAT); `drive` runs MinIO on 9100; `cache` runs Valkey on 6379; `llm` runs llama.cpp on 8081; `vector_db` runs Qdrant on 6333. +### Derive Eq with PartialEq +```rust +#[derive(PartialEq, Eq)] // ✅ Always both +struct MyStruct { } +``` -Use the `LOAD_ONLY` variable in `/opt/gbo/bin/.env` to filter which bots are loaded and monitored by DriveMonitor, for example `LOAD_ONLY=default,salesianos`. +### Inline Format Args +```rust +format!("Hello {name}") // ✅ Not format!("{}", name) +``` + +### Combine Match Arms +```rust +match x { + A | B => do_thing(), // ✅ Combine identical arms + C => other(), +} +``` --- -## Debugging & Testing +## ❌ Absolute Prohibitions +- NEVER search /target folder! It is binary compiled. +- ❌ **NEVER** hardcode passwords, tokens, API keys, or any credentials in source code — ALWAYS use `generate_random_string()` or environment variables +- ❌ **NEVER** build in release mode - ONLY debug builds allowed +- ❌ **NEVER** use `--release` flag on ANY cargo command +- ❌ **NEVER** run `cargo build` - use `cargo check` for syntax verification +- ❌ **NEVER** compile directly for production - ALWAYS use push + CI/CD pipeline +- ❌ **NEVER** use `scp` or manual transfer to deploy - ONLY CI/CD ensures correct deployment +- ❌ **NEVER** manually copy binaries to production system container - ALWAYS push to ALM and let CI/CD build and deploy +- ❌ **NEVER** SSH into system container to deploy binaries - CI workflow handles build, transfer, and restart via alm-ci SSH +- ✅ **ALWAYS** push code to ALM → CI builds on alm-ci → CI deploys to system container via SSH from alm-ci +- ✅ **CI deploy path**: alm-ci builds at `/opt/gbo/data/botserver/target/debug/botserver` → tar+gzip via SSH → `/opt/gbo/bin/botserver` on system container → restart +- ❌ **NEVER** manually copy binaries to production system container - ALWAYS push to ALM and let CI/CD build and deploy +- ❌ **NEVER** SSH into system container to deploy binaries - CI workflow handles build, transfer, and restart via alm-ci SSH +- ✅ **ALWAYS** push code to ALM → CI builds on alm-ci → CI deploys to system container via SSH from alm-ci +- ✅ **CI deploy path**: alm-ci builds at `/opt/gbo/data/botserver/target/debug/botserver` → tar+gzip via SSH → `/opt/gbo/bin/botserver` on system container → restart -To watch for errors live: `tail -f botserver.log | grep -i "error\|tool"`. To debug a specific tool: grep `Tool error` in logs, fix the `.bas` file in MinIO at `/{bot}.gbai/{bot}.gbdialog/{tool}.bas`, then wait for DriveMonitor to recompile (automatic on file change, in-memory only, no local `.ast` cache). Test in browser at `http://localhost:3000/{botname}`. - -Common BASIC errors: `=== is not a valid operator` means you used JavaScript-style `===` — replace with `==` or use `--` for string separators. `Syntax error` means bad BASIC syntax — check parentheses and commas. `Tool execution failed` means a runtime error — check logs for stack trace. - -For Playwright testing, navigate to `http://localhost:3000/`, snapshot to verify welcome message and suggestion buttons including Portuguese accents, click a suggestion, wait 3–5 seconds, snapshot, fill data, submit, then verify DB records and backend logs. If the browser hangs, run `pkill -9 -f brave; pkill -9 -f chrome; pkill -9 -f chromium`, wait 3 seconds, and navigate again. The chat window may overlap other apps — click the middle (restore) button to minimize it or navigate directly via URL. - -WhatsApp routing is global — one number serves all bots, with routing determined by the `whatsapp-id` key in each bot's `config.csv`. The bot name is sent as the first message to route correctly. +**Current Status:** ✅ **0 clippy warnings** (down from 61 - PERFECT SCORE in YOLO mode) +- ❌ **NEVER** use `panic!()`, `todo!()`, `unimplemented!()` +- ❌ **NEVER** use `Command::new()` directly - use `SafeCommand` +- ❌ **NEVER** return raw error strings to HTTP clients +- ❌ **NEVER** use `#[allow()]` in source code - FIX the code instead +- ❌ **NEVER** add lint exceptions to `Cargo.toml` - FIX the code instead +- ❌ **NEVER** use `_` prefix for unused variables - DELETE or USE them +- ❌ **NEVER** leave unused imports or dead code +- ❌ **NEVER** use CDN links - all assets must be local +- ❌ **NEVER** create `.md` documentation files without checking `botbook/` first +- ❌ **NEVER** comment out code - FIX it or DELETE it entirely --- -## Bot Scripts Architecture +## 📏 File Size Limits - MANDATORY -`start.bas` is the entry point executed on WebSocket connect and on the first user message (once per session). It loads suggestion buttons via `ADD_SUGGESTION_TOOL` and marks the session in Redis to prevent re-runs. `{tool}.bas` files implement individual tools (e.g. `detecta.bas`). `tables.bas` is a special file — never call it with `CALL`; it is parsed automatically at compile time by `process_table_definitions()` and its table definitions are synced to the database via `sync_bot_tables()`. `init_folha.bas` handles initialization for specific features. +### Maximum 450 Lines Per File -The `CALL` keyword can invoke in-memory procedures or `.bas` scripts by name. If the target is not in memory, botserver looks for `{name}.bas` in the bot's gbdialog folder in Drive. The `DETECT` keyword analyzes a database table for anomalies: it requires the table to exist (defined in `tables.bas`) and calls the BotModels API at `/api/anomaly/detect`. +When a file grows beyond this limit: -Tool buttons use `MessageType::TOOL_EXEC` (id 6). When the frontend sends `message_type: 6` via WebSocket, the backend executes the named tool directly in `stream_response()`, bypassing KB injection and LLM entirely. The result appears in chat without any "/tool" prefix text. Other message types are: 0 EXTERNAL, 1 USER, 2 BOT_RESPONSE, 3 CONTINUE, 4 SUGGESTION, 5 CONTEXT_CHANGE. +1. **Identify logical groups** - Find related functions +2. **Create subdirectory module** - e.g., `handlers/` +3. **Split by responsibility:** + - `types.rs` - Structs, enums, type definitions + - `handlers.rs` - HTTP handlers and routes + - `operations.rs` - Core business logic + - `utils.rs` - Helper functions + - `mod.rs` - Re-exports and configuration +4. **Keep files focused** - Single responsibility +5. **Update mod.rs** - Re-export all public items + +**NEVER let a single file exceed 450 lines - split proactively at 350 lines** --- -## Submodule Push Rule — Mandatory +## 🔥 Error Fixing Workflow -Every time you push the main repo, you must also push all submodules. CI builds based on submodule commits — if a submodule is not pushed, CI deploys old code. Always push botserver, botui, and botlib to both `origin` and `alm` remotes before or alongside the main repo push. +### Mode 1: OFFLINE Batch Fix (PREFERRED) -The deploy workflow is: push to ALM → CI triggers on alm-ci → builds inside system container via SSH (to match glibc 2.36 on Debian 12 Bookworm, not the CI runner's glibc 2.41) → deploys binary → service auto-restarts. Verify by checking service status and logs about 10 minutes after pushing. +When given error output: + +1. **Read ENTIRE error list first** +2. **Group errors by file** +3. **For EACH file with errors:** + a. View file → understand context + b. Fix ALL errors in that file + c. Write once with all fixes +4. **Move to next file** +5. **REPEAT until ALL errors addressed** +6. **ONLY THEN → verify with build/diagnostics** + +**NEVER run cargo build/check/clippy DURING fixing** +**Fix ALL errors OFFLINE first, verify ONCE at the end** + +### Mode 2: Interactive Loop + +``` +LOOP UNTIL (0 warnings AND 0 errors): + 1. Run diagnostics → pick file with issues + 2. Read entire file + 3. Fix ALL issues in that file + 4. Write file once with all fixes + 5. Verify with diagnostics + 6. CONTINUE LOOP +END LOOP +``` + +### ⚡ Streaming Build Rule + +**Do NOT wait for `cargo` to finish.** As soon as the first errors appear in output, cancel/interrupt the build, fix those errors immediately, then re-run. This avoids wasting time on a full compile when errors are already visible. --- -## Zitadel Setup (Directory Service) +## 🧠 Memory Management -Zitadel runs in the `directory` container on port 8080 internally. External port 9000 is forwarded to it via iptables NAT on the system container. The database is `PROD-DIRECTORY` on the `tables` container. The PAT file is at `/opt/gbo/conf/directory/admin-pat.txt` on the directory container. Admin credentials are username `admin`, password `Admin123!`. Current version is Zitadel v4.13.1. **Known bug**: Web console UI will return 404 for environment.json when accessed via reverse proxy public domain. Use http://:9000/ui/console for administrative interface instead. +When compilation fails due to memory issues (process "Killed"): -To reinstall: drop and recreate `PROD-DIRECTORY` on the tables container, write the init YAML to `/opt/gbo/conf/directory/zitadel-init-steps.yaml` (defining org name, admin user, and PAT expiry), then start Zitadel with env vars for the PostgreSQL host/port/database/credentials, `ZITADEL_EXTERNALSECURE=false`, `ZITADEL_EXTERNALDOMAIN=`, `ZITADEL_EXTERNALPORT=9000`, and `ZITADEL_TLS_ENABLED=false`. Pass `--masterkey MasterkeyNeedsToHave32Characters`, `--tlsMode disabled`, and `--steps `. Bootstrap takes about 90 seconds; verify with `curl -sf http://localhost:8080/debug/healthz`. - -Key API endpoints: Use **v2 API endpoints** for all operations: `POST /v2/organizations/{org_id}/domains` to add domains, `POST /v2/users/new` to create users, `POST /oauth/v2/token` for access tokens, `GET /debug/healthz` for health. When calling externally via port 9000, include `Host: ` header. The v1 Management API is deprecated and not functional in this version. - - -## Frontend Standards & Performance - -HTMX-first: the server returns HTML fragments, not JSON. Use `hx-get`, `hx-post`, `hx-target`, `hx-swap`, and WebSocket via htmx-ws. All assets must be local — no CDN links. - -Release profile must use `opt-level = "z"`, `lto = true`, `codegen-units = 1`, `strip = true`, and `panic = "abort"`. Use `default-features = false` and opt into only needed features. Run `cargo tree --duplicates`, `cargo machete`, and `cargo audit` weekly. - -Testing: unit tests live in per-crate `tests/` folders or `#[cfg(test)]` modules, run with `cargo test -p `. Integration tests live in `bottest/`, run with `cargo test -p bottest`. Aim for 80%+ coverage on critical paths; all error paths and security guards must be tested. +```bash +pkill -9 cargo; pkill -9 rustc; pkill -9 botserver +CARGO_BUILD_JOBS=1 cargo check -p botserver 2>&1 | tail -200 +``` --- -## Core Directives Summary +## 🎭 Playwright Browser Testing - YOLO Mode + +**When user requests to start YOLO mode with Playwright:** + +1. **Start the browser** - Use `mcp__playwright__browser_navigate` to open http://localhost:3000/{botname} +2. **Take snapshot** - Use `mcp__playwright__browser_snapshot` to see current page state +3. **Test user flows** - Use click, type, fill_form, etc. +4. **Verify results** - Check for expected content, errors in console, network requests +5. **Validate backend** - Check database and services to confirm process completion +6. **Report findings** - Always include screenshot evidence with `browser_take_screenshot` + +**⚠️ IMPORTANT - Desktop UI Navigation:** +- The desktop may have a maximized chat window covering other apps +- To access CRM/sidebar icons, click the **middle button** (restore/down arrow) in the chat window header to minimize it +- Or navigate directly via URL: http://localhost:3000/suite/crm (after login) + +**Bot-Specific Testing URL Pattern:** +`http://localhost:3000/` + +**Backend Validation Checks:** +After UI interactions, validate backend state via `psql` or `tail` logs. + +--- + +## ➕ Adding New Features Workflow + +### Step 1: Plan the Feature + +**Understand requirements:** +1. What problem does this solve? +2. Which module owns this functionality? (Check [Module Responsibility Matrix](../README.md#-module-responsibility-matrix)) +3. What data structures are needed? +4. What are the security implications? + +**Design checklist:** +- [ ] Does it fit existing architecture patterns? +- [ ] Will it require database migrations? +- [ ] Does it need new API endpoints? +- [ ] Will it affect existing features? +- [ ] What are the error cases? + +### Step 2: Implement the Feature + +**Follow the pattern:** +```rust +// 1. Add types to botlib if shared across crates +// botlib/src/models.rs +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct NewFeature { + pub id: Uuid, + pub name: String, +} + +// 2. Add database schema if needed +// botserver/migrations/YYYY-MM-DD-HHMMSS_feature_name/up.sql +CREATE TABLE new_features ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name VARCHAR(255) NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +// 3. Add Diesel model +// botserver/src/core/shared/models/core.rs +#[derive(Queryable, Insertable)] +#[diesel(table_name = new_features)] +pub struct NewFeatureDb { + pub id: Uuid, + pub name: String, + pub created_at: DateTime, +} + +// 4. Add business logic +// botserver/src/features/new_feature.rs +pub async fn create_feature( + state: &AppState, + name: String, +) -> Result { + // Implementation +} + +// 5. Add API endpoint +// botserver/src/api/routes.rs +async fn create_feature_handler( + Extension(state): Extension>, + Json(payload): Json, +) -> Result, (StatusCode, String)> { + // Handler implementation +} +``` + +**Security checklist:** +- [ ] Input validation (use `sanitize_identifier` for SQL) +- [ ] Authentication required? +- [ ] Authorization checks? +- [ ] Rate limiting needed? +- [ ] Error messages sanitized? (use `log_and_sanitize`) +- [ ] No `unwrap()` or `expect()` in production code + +### Step 3: Add BASIC Keywords (if applicable) + +**For features accessible from .bas scripts:** +```rust +// botserver/src/basic/keywords/new_feature.rs +pub fn new_feature_keyword( + state: Arc, + user_session: UserSession, + engine: &mut Engine, +) { + let state_clone = state.clone(); + let session_clone = user_session.clone(); + + engine + .register_custom_syntax( + ["NEW_FEATURE", "$expr$"], + true, + move |context, inputs| { + let param = context.eval_expression_tree(&inputs[0])?.to_string(); + + // Call async function from sync context using separate thread + let (tx, rx) = std::sync::mpsc::channel(); + std::thread::spawn(move || { + let rt = tokio::runtime::Builder::new_current_thread() + .enable_all().build().ok(); + let result = if let Some(rt) = rt { + rt.block_on(async { + create_feature(&state_clone, param).await + }) + } else { + Err("Failed to create runtime".into()) + }; + let _ = tx.send(result); + }); + let result = rx.recv().unwrap_or(Err("Channel error".into())); + + match result { + Ok(feature) => Ok(Dynamic::from(feature.name)), + Err(e) => Err(format!("Failed: {}", e).into()), + } + }, + ) + .expect("valid syntax registration"); +} +``` + +### Step 4: Test the Feature + +**Local testing:** +```bash +# 1. Run migrations +diesel migration run + +# 2. Build and restart +./restart.sh + +# 3. Test via API +curl -X POST http://localhost:9000/api/features \ + -H "Content-Type: application/json" \ + -d '{"name": "test"}' + +# 4. Test via BASIC script +# Create test.bas in /opt/gbo/data/testbot.gbai/testbot.gbdialog/ +# NEW_FEATURE "test" + +# 5. Check logs +tail -f botserver.log | grep -i "new_feature" +``` + +**Integration test:** +```rust +// bottest/tests/new_feature_test.rs +#[tokio::test] +async fn test_create_feature() { + let state = setup_test_state().await; + let result = create_feature(&state, "test".to_string()).await; + assert!(result.is_ok()); +} +``` + +### Step 5: Document the Feature + +**Update documentation:** +- Add to `botbook/src/features/` if user-facing +- Add to module README.md if developer-facing +- Add inline code comments for complex logic +- Update API documentation + +**Example documentation:** +```markdown +## NEW_FEATURE Keyword + +Creates a new feature with the given name. + +**Syntax:** +```basic +NEW_FEATURE "feature_name" +``` + +**Example:** +```basic +NEW_FEATURE "My Feature" +TALK "Feature created!" +``` + +**Returns:** Feature name as string +``` + +### Step 6: Commit & Deploy + +**Commit pattern:** +```bash +git add . +git commit -m "feat: Add NEW_FEATURE keyword + +- Adds new_features table with migrations +- Implements create_feature business logic +- Adds NEW_FEATURE BASIC keyword +- Includes API endpoint at POST /api/features +- Tests: Unit tests for business logic, integration test for API" + +git push alm main +git push origin main +``` + +--- + +## 🧪 Testing Strategy + +### Unit Tests +- **Location**: Each crate has `tests/` directory or inline `#[cfg(test)]` modules +- **Naming**: Test functions use `test_` prefix or describe what they test +- **Running**: `cargo test -p ` or `cargo test` for all + +### Integration Tests +- **Location**: `bottest/` crate contains integration tests +- **Scope**: Tests full workflows across multiple crates +- **Running**: `cargo test -p bottest` + +### Coverage Goals +- **Critical paths**: 80%+ coverage required +- **Error handling**: ALL error paths must have tests +- **Security**: All security guards must have tests + +### WhatsApp Integration Testing + +#### Prerequisites +1. **Enable WhatsApp Feature**: Build botserver with whatsapp feature enabled: + ```bash + cargo build -p botserver --bin botserver --features whatsapp + ``` +2. **Bot Configuration**: Ensure the bot has WhatsApp credentials configured in `config.csv`: + - `whatsapp-api-key` - API key from Meta Business Suite + - `whatsapp-verify-token` - Custom token for webhook verification + - `whatsapp-phone-number-id` - Phone Number ID from Meta + - `whatsapp-business-account-id` - Business Account ID from Meta + +#### Using Localtunnel (lt) as Reverse Proxy + +# Check database for message storage +psql -h localhost -U postgres -d botserver -c "SELECT * FROM messages WHERE bot_id = '' ORDER BY created_at DESC LIMIT 5;" +--- + +## 🐛 Debugging Rules + +### 🚨 CRITICAL ERROR HANDLING RULE + +**STOP EVERYTHING WHEN ERRORS APPEAR** + +When ANY error appears in logs during startup or operation: +1. **IMMEDIATELY STOP** - Do not continue with other tasks +2. **IDENTIFY THE ERROR** - Read the full error message and context +3. **FIX THE ERROR** - Address the root cause, not symptoms +4. **VERIFY THE FIX** - Ensure error is completely resolved +5. **ONLY THEN CONTINUE** - Never ignore or work around errors + +**NEVER restart servers to "fix" errors - FIX THE ACTUAL PROBLEM** + +### Log Locations + +| Component | Log File | What's Logged | +|-----------|----------|---------------| +| **botserver** | `botserver.log` | API requests, errors, script execution, **client navigation events** | +| **botui** | `botui.log` | UI rendering, WebSocket connections | +| **drive_monitor** | In botserver logs with `[drive_monitor]` prefix | File sync, compilation | +| **client errors** | In botserver logs with `CLIENT:` prefix | JavaScript errors, navigation events | + +--- + +## 🔧 Bug Fixing Workflow + +### Step 1: Reproduce & Diagnose + +**Identify the symptom:** +```bash +# Check recent errors +grep -E " E | W " botserver.log | tail -20 + +# Check specific component +grep "component_name" botserver.log | tail -50 + +# Monitor live +tail -f botserver.log | grep -E "ERROR|WARN" +``` + +**Trace the data flow:** +1. Find where the bug manifests (UI, API, database, cache) +2. Work backwards through the call chain +3. Check logs at each layer + +**Example: "Suggestions not showing"** +```bash +# 1. Check if frontend is requesting suggestions +grep "GET /api/suggestions" botserver.log | tail -5 + +# 2. Check if suggestions exist in cache +/opt/gbo/bin/botserver-stack/bin/cache/bin/valkey-cli --scan --pattern "suggestions:*" + +# 3. Check if suggestions are being generated +grep "ADD_SUGGESTION" botserver.log | tail -10 + +# 4. Verify the Redis key format +grep "Adding suggestion to Redis key" botserver.log | tail -5 +``` + +### Step 2: Find the Code + +**Use code search tools:** +```bash +# Find function/keyword implementation +cd botserver/src && grep -r "ADD_SUGGESTION_TOOL" --include="*.rs" + +# Find where Redis keys are constructed +grep -r "suggestions:" --include="*.rs" | grep format + +# Find struct definition +grep -r "pub struct UserSession" --include="*.rs" +``` + +**Check module responsibility:** +- Refer to [Module Responsibility Matrix](../README.md#-module-responsibility-matrix) +- Check `mod.rs` files for module structure +- Look for related functions in same file + +### Step 3: Fix the Bug + +**Identify root cause:** +- Wrong variable used? (e.g., `user_id` instead of `bot_id`) +- Missing validation? +- Race condition? +- Configuration issue? + +**Make minimal changes:** +```rust +// ❌ BAD: Rewrite entire function +fn add_suggestion(...) { + // 100 lines of new code +} + +// ✅ GOOD: Fix only the bug +fn add_suggestion(...) { + // Change line 318: + - let key = format!("suggestions:{}:{}", user_session.user_id, session_id); + + let key = format!("suggestions:{}:{}", user_session.bot_id, session_id); +} +``` + +**Search for similar bugs:** +```bash +# If you fixed user_id -> bot_id in one place, check all occurrences +grep -n "user_session.user_id" botserver/src/basic/keywords/add_suggestion.rs +``` + +### Step 4: Test Locally + +**Verify the fix:** +```bash +# 1. Build +cargo check -p botserver + +# 2. Restart +./restart.sh + +# 3. Test the specific feature +# - Open browser to http://localhost:3000/ +# - Trigger the bug scenario +# - Verify it's fixed + +# 4. Check logs for errors +tail -20 botserver.log | grep -E "ERROR|WARN" +``` + +### Step 5: Commit & Deploy + +**Commit with clear message:** +```bash +cd botserver +git add src/path/to/file.rs +git commit -m "Fix: Use bot_id instead of user_id in suggestion keys + +- Root cause: Wrong field used in Redis key format +- Impact: Suggestions stored under wrong key, frontend couldn't retrieve +- Files: src/basic/keywords/add_suggestion.rs (5 occurrences) +- Testing: Verified suggestions now appear in UI" +``` + +**Push to remotes:** +```bash +# Push submodule +git push alm main +git push origin main + +# Update root repository +cd .. +git add botserver +git commit -m "Update botserver: Fix suggestion key bug" +git push alm main +git push origin main +``` + +**Production deployment:** +- ALM push triggers CI/CD pipeline +- Wait ~10 minutes for build + deploy +- Service auto-restarts on binary update +- Test in production after deployment + +### Step 6: Document + +**Add to AGENTS-PROD.md if production-relevant:** +- Common symptom +- Diagnosis commands +- Fix procedure +- Prevention tips + +**Update code comments if needed:** +```rust +// Redis key format: suggestions:bot_id:session_id +// Note: Must use bot_id (not user_id) to match frontend queries +let key = format!("suggestions:{}:{}", user_session.bot_id, session_id); +``` + +--- + +## 🎨 Frontend Standards + +### HTMX-First Approach +- Use HTMX to minimize JavaScript +- Server returns HTML fragments, not JSON +- Use `hx-get`, `hx-post`, `hx-target`, `hx-swap` +- WebSocket via htmx-ws extension + +### Local Assets Only - NO CDN +```html + + + + + +``` + +--- + +## 🚀 Performance & Size Standards + +### Binary Size Optimization +- **Release Profile**: Always maintain `opt-level = "z"`, `lto = true`, `codegen-units = 1`, `strip = true`, `panic = "abort"`. +- **Dependencies**: + - Run `cargo tree --duplicates` weekly + - Run `cargo machete` to remove unused dependencies + - Use `default-features = false` and explicitly opt-in to needed features + +### Linting & Code Quality +- **Clippy**: Code MUST pass `cargo clippy --workspace` with **0 warnings**. +- **No Allow**: NEVER use `#[allow(clippy::...)]` in source code - FIX the code instead. + +--- + +## 🔧 Technical Debt + +### Critical Issues to Address +- Error handling debt: instances of `unwrap()`/`expect()` in production code +- Performance debt: excessive `clone()`/`to_string()` calls +- File size debt: files exceeding 450 lines + +### Weekly Maintenance Tasks +```bash +cargo tree --duplicates # Find duplicate dependencies +cargo machete # Remove unused dependencies +cargo build --release && ls -lh target/release/botserver # Check binary size +cargo audit # Security audit +``` + +--- + +## 📋 Continuation Prompt + +When starting a new session or continuing work: + +``` +Continue on gb/ workspace. Follow AGENTS.md strictly: + +1. Check current state with build/diagnostics +2. Fix ALL warnings and errors - NO #[allow()] attributes +3. Delete unused code, don't suppress warnings +4. Remove unused parameters, don't prefix with _ +5. Replace ALL unwrap()/expect() with proper error handling +6. Verify after each fix batch +7. Loop until 0 warnings, 0 errors +8. Refactor files >450 lines +``` + +--- + +## 🔑 Memory & Main Directives + +**LOOP AND COMPACT UNTIL 0 WARNINGS - MAXIMUM PRECISION** + +- 0 warnings +- 0 errors +- Trust project diagnostics +- Respect all rules +- No `#[allow()]` in source code +- Real code fixes only + +**Remember:** +- **OFFLINE FIRST** - Fix all errors from list before compiling +- **BATCH BY FILE** - Fix ALL errors in a file at once +- **WRITE ONCE** - Single edit per file with all fixes +- **VERIFY LAST** - Only compile/diagnostics after ALL fixes +- **DELETE DEAD CODE** - Don't keep unused code around +- **GIT WORKFLOW** - ALWAYS push to ALL repositories (github, pragmatismo) + +--- + +## Deploy in Prod Workflow + +### CI/CD Pipeline (Primary Method) + +1. **Push to ALM** — triggers CI/CD automatically: + ```bash + cd botserver + git push alm main + git push origin main + cd .. + git add botserver + git commit -m "Update botserver: " + git push alm main + git push origin main + ``` + +2. **Wait for CI programmatically** — poll Forgejo API until build completes: + ```bash + # ALM is at http://:4747 (port 4747, NOT 3000) + # The runner is in container alm-ci, registered with token from DB + + # Method 1: Poll API for latest workflow run status + ALM_URL="http://:4747" + REPO="GeneralBots/BotServer" + MAX_WAIT=600 # 10 minutes + ELAPSED=0 + + while [ $ELAPSED -lt $MAX_WAIT ]; do + STATUS=$(curl -sf "$ALM_URL/api/v1/repos/$REPO/actions/runs?per_page=1" | python3 -c "import sys,json; runs=json.load(sys.stdin); print(runs[0]['status'] if runs else 'unknown')") + if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failure" ] || [ "$STATUS" = "cancelled" ]; then + echo "CI finished with status: $STATUS" + break + fi + echo "CI status: $STATUS (waiting ${ELAPSED}s...)" + sleep 15 + ELAPSED=$((ELAPSED + 15)) + done + + # Method 2: Check runner logs directly + ssh "sudo incus exec alm-ci -- tail -20 /opt/gbo/logs/forgejo-runner.log" + + # Method 3: Check binary timestamp after CI completes + sleep 240 + ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 \ + "sudo incus exec system -- stat -c '%y' /opt/gbo/bin/botserver" + ``` + +3. **Restart in prod** — after binary updates: + ```bash + ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 \ + "sudo incus exec system -- pkill -f botserver || true" + sleep 2 + ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 \ + "sudo incus exec system -- bash -c 'cd /opt/gbo/bin && RUST_LOG=info nohup ./botserver --noconsole > /opt/gbo/logs/stdout.log 2>&1 &'" + ``` + +4. **Verify deployment**: + ```bash + # Wait for bootstrap (~2 min) + sleep 120 + # Check health + ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 \ + "sudo incus exec system -- curl -s -o /dev/null -w '%{http_code}' http://localhost:8080/health" + # Check logs + ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 \ + "sudo incus exec system -- tail -30 /opt/gbo/logs/stdout.log" + ``` + +### Production Container Architecture + +| Container | Service | Port | Notes | +|-----------|---------|------|-------| +| system | BotServer | 8080 | Main API server | +| vault | Vault | 8200 | Secrets management (isolated) | +| tables | PostgreSQL | 5432 | Database | +| cache | Valkey | 6379 | Cache | +| drive | MinIO | 9100 | Object storage | +| directory | Zitadel | 9000 | Identity provider | +| meet | LiveKit | 7880 | Video conferencing | +| vectordb | Qdrant | 6333 | Vector database | +| llm | llama.cpp | 8081 | Local LLM | +| email | Stalwart | 25/587 | Mail server | +| alm | Forgejo | 4747 | Git server (NOT 3000!) | +| alm-ci | Forgejo Runner | - | CI runner | +| proxy | Caddy | 80/443 | Reverse proxy | + +**Important:** ALM (Forgejo) listens on port **4747**, not 3000. The runner token is stored in the `action_runner_token` table in the `PROD-ALM` database. + +### CI Runner Troubleshooting + +| Symptom | Cause | Fix | +|---------|-------|-----| +| Runner not connecting | Wrong ALM port (3000 vs 4747) | Use port 4747 in runner registration | +| `registration file not found` | `.runner` file missing or wrong format | Re-register: `forgejo-runner register --instance http://:4747 --token --name gbo --labels ubuntu-latest:docker://node:20-bookworm --no-interactive` | +| `unsupported protocol scheme` | `.runner` file has wrong JSON format | Delete `.runner` and re-register | +| `connection refused` to ALM | iptables blocking or ALM not running | Check `sudo incus exec alm -- ss -tlnp \| grep 4747` | +| CI not picking up jobs | Runner not registered or labels mismatch | Check runner labels match workflow `runs-on` field | + +--- + +## 🖥️ Production Operations Guide + +### ⚠️ CRITICAL SAFETY RULES +1. **NEVER modify iptables rules without explicit confirmation** — always confirm the exact rules, source IPs, ports, and destinations before applying +2. **NEVER touch the PROD project without asking first** — no changes to production services, configs, or containers without user approval +3. **ALWAYS backup files to `/tmp` before editing** — e.g. `cp /path/to/file /tmp/$(basename /path/to/file).bak-$(date +%Y%m%d%H%M%S)` + +### Infrastructure Overview +- **Host OS:** Ubuntu LTS +- **Container engine:** Incus (LXC-based) +- **Base path:** `/opt/gbo/` (General Bots Operations) +- **Data path:** `/opt/gbo/data` — shared data, configs, bot definitions +- **Bin path:** `/opt/gbo/bin` — compiled binaries +- **Conf path:** `/opt/gbo/conf` — service configurations +- **Log path:** `/opt/gbo/logs` — application logs + +### Container Architecture + +| Role | Service | Typical Port | Notes | +|------|---------|-------------|-------| +| **dns** | CoreDNS | 53 | DNS resolution, zone files in `/opt/gbo/data` | +| **proxy** | Caddy | 80/443 | Reverse proxy, TLS termination | +| **tables** | PostgreSQL | 5432 | Primary database | +| **email** | Stalwart | 993/465/587 | Mail server (IMAPS, SMTPS, Submission) | +| **system** | BotServer + Valkey | 8080/6379 | Main API + cache | +| **webmail** | Roundcube | behind proxy | PHP-FPM webmail frontend | +| **alm** | Forgejo | 4747 | Git/ALM server (NOT 3000!) | +| **alm-ci** | Forgejo Runner | - | CI/CD runner | +| **drive** | MinIO | 9000/9100 | Object storage | +| **table-editor** | NocoDB | behind proxy | Database UI, connects to tables | +| **vault** | Vault | 8200 | Secrets management | +| **directory** | Zitadel | 9000 | Identity provider | +| **meet** | LiveKit | 7880 | Video conferencing | +| **vectordb** | Qdrant | 6333 | Vector database | +| **llm** | llama.cpp | 8081 | Local LLM inference | + +### Container Management + +```bash +# List all containers +sudo incus list + +# Start/Stop/Restart +sudo incus start +sudo incus stop +sudo incus restart + +# Exec into container +sudo incus exec -- bash + +# View container logs +sudo incus log +sudo incus log --show-log + +# File operations +sudo incus file pull /path/to/file /local/dest +sudo incus file push /local/src /path/to/dest + +# Create snapshot before changes +sudo incus snapshot create pre-change-$(date +%Y%m%d%H%M%S) +``` + +### Service Management (inside container) + +```bash +# Check if process is running +sudo incus exec -- pgrep -a + +# Restart service (systemd) +sudo incus exec -- systemctl restart + +# Follow logs +sudo incus exec -- journalctl -u -f + +# Check listening ports +sudo incus exec -- ss -tlnp +``` + +### Quick Health Check + +```bash +# Check all containers status +sudo incus list --format csv + +# Quick service check across containers +for c in dns proxy tables system email webmail alm alm-ci drive table-editor; do + echo -n "$c: " + sudo incus exec $c -- pgrep -a $(case $c in + dns) echo "coredns";; + proxy) echo "caddy";; + tables) echo "postgres";; + system) echo "botserver";; + email) echo "stalwart";; + webmail) echo "php-fpm";; + alm) echo "forgejo";; + alm-ci) echo "runner";; + drive) echo "minio";; + table-editor) echo "nocodb";; + esac) >/dev/null && echo OK || echo FAIL +done +``` + +### Network & NAT + +#### Port Forwarding Pattern +External ports on the host are DNAT'd to container IPs via iptables. NAT rules live in `/etc/iptables.rules`. + +**Critical rule pattern** — always use the external interface (`-i `) to avoid loopback issues: +``` +-A PREROUTING -i -p tcp --dport -j DNAT --to-destination : +``` + +#### Typical Port Map + +| External | Service | Notes | +|----------|---------|-------| +| 53 | DNS | Public DNS resolution | +| 80/443 | HTTP/HTTPS | Via Caddy proxy | +| 5432 | PostgreSQL | Restricted access only | +| 993 | IMAPS | Secure email retrieval | +| 465 | SMTPS | Secure email sending | +| 587 | SMTP Submission | STARTTLS | +| 25 | SMTP | Often blocked by ISPs | +| 4747 | Forgejo | Behind proxy | +| 9000 | MinIO API | Internal only | +| 8200 | Vault | Isolated | + +#### Network Diagnostics + +```bash +# Check NAT rules +sudo iptables -t nat -L -n | grep DNAT + +# Test connectivity from container +sudo incus exec -- ping -c 3 8.8.8.8 + +# Test DNS resolution +sudo incus exec -- dig + +# Test port connectivity +nc -zv +``` + +### Key Service Operations + +#### DNS (CoreDNS) +- **Config:** `/opt/gbo/conf/Corefile` +- **Zones:** `/opt/gbo/data/.zone` +- **Test:** `dig @ ` + +#### Database (PostgreSQL) +- **Data:** `/opt/gbo/data` +- **Backup:** `pg_dump -U postgres -F c -f /tmp/backup.dump ` +- **Restore:** `pg_restore -U postgres -d /tmp/backup.dump` + +#### Email (Stalwart) +- **Config:** `/opt/gbo/conf/config.toml` +- **DKIM:** Check TXT records for `selector._domainkey.` +- **Webmail:** Behind proxy +- **Admin:** Accessible via configured admin port + +**Recovery from crash:** +```bash +# Check if service starts with config validation +sudo incus exec email -- /opt/gbo/bin/stalwart -c /opt/gbo/conf/config.toml --help + +# Check error logs +sudo incus exec email -- cat /opt/gbo/logs/stderr.log + +# Restore from snapshot if config corrupted +sudo incus snapshot list email +sudo incus copy email/ email-temp +sudo incus start email-temp +sudo incus file pull email-temp/opt/gbo/conf/config.toml /tmp/config.toml +sudo incus file push /tmp/config.toml email/opt/gbo/conf/config.toml +``` + +#### Proxy (Caddy) +- **Config:** `/opt/gbo/conf/config` +- **Backup before edit:** `cp /opt/gbo/conf/config /opt/gbo/conf/config.bak-$(date +%Y%m%d)` +- **Validate:** `caddy validate --config /opt/gbo/conf/config` +- **Reload:** `caddy reload --config /opt/gbo/conf/config` + +#### Storage (MinIO) +- **Console:** Behind proxy +- **Internal API:** http://:9000 +- **Data:** `/opt/gbo/data` + +#### Bot System (system) +- **Service:** BotServer + Valkey (Redis-compatible) +- **Binary:** `/opt/gbo/bin/botserver` +- **Valkey:** port 6379 + +#### Git/ALM (Forgejo) +- **Port:** 4747 (NOT 3000!) +- **Behind proxy:** Access via configured hostname +- **CI Runner:** Separate container, registered with token from DB + +#### CI/CD (Forgejo Runner) +- **Config:** `/opt/gbo/bin/config.yaml` +- **Init:** `/etc/systemd/system/alm-ci-runner.service` (runs as `gbuser`, NOT root) +- **Logs:** `/opt/gbo/logs/out.log`, `/opt/gbo/logs/err.log` +- **Auto-start:** Via systemd (enabled) +- **Runner user:** `gbuser` (uid 1000) — all `/opt/gbo/` files owned by `gbuser:gbuser` +- **sccache:** Installed at `/usr/local/bin/sccache`, configured via `RUSTC_WRAPPER=sccache` in workflow +- **Workspace:** `/opt/gbo/data/` (NOT `/opt/gbo/ci/`) +- **Cargo cache:** `/home/gbuser/.cargo/` (registry + git db) +- **Rustup:** `/home/gbuser/.rustup/` +- **SSH keys:** `/home/gbuser/.ssh/id_ed25519` (for deploy to system container) +- **Deploy mechanism:** CI builds binary → tar+gzip via SSH → `/opt/gbo/bin/botserver` on system container + +### Backup & Recovery + +#### Snapshot Recovery +```bash +# List snapshots +sudo incus snapshot list + +# Restore from snapshot +sudo incus copy / -restored +sudo incus start -restored + +# Get files from snapshot without starting +sudo incus file pull //path/to/file . +``` + +#### Backup Scripts +- Host config backup: `/opt/gbo/bin/backup-local-host.sh` +- Remote backup to S3: `/opt/gbo/bin/backup-remote.sh` + +### Troubleshooting + +#### Container Won't Start +```bash +# Check status +sudo incus list +sudo incus info + +# Check logs +sudo incus log --show-log + +# Try starting with verbose +sudo incus start -v +``` + +#### Service Not Running +```bash +# Find process +sudo incus exec -- pgrep -a + +# Check listening ports +sudo incus exec -- ss -tlnp | grep + +# Check application logs +sudo incus exec -- tail -50 /opt/gbo/logs/stderr.log +``` + +#### Email Delivery Issues +```bash +# Check mail server is running +sudo incus exec email -- pgrep -a stalwart + +# Check IMAP/SMTP ports +nc -zv 993 +nc -zv 465 +nc -zv 587 + +# Check DKIM DNS records +dig TXT ._domainkey. + +# Check mail logs +sudo incus exec email -- tail -100 /opt/gbo/logs/email.log +``` + +### Maintenance + +#### Update Container +```bash +# Stop container +sudo incus stop + +# Create snapshot backup +sudo incus snapshot create pre-update-$(date +%Y%m%d) + +# Update packages +sudo incus exec -- apt update && apt upgrade -y + +# Restart +sudo incus start +``` + +#### Disk Space Management +```bash +# Check host disk usage +df -h / + +# Check btrfs pool (if applicable) +sudo btrfs filesystem df /var/lib/incus + +# Clean old logs in container +sudo incus exec -- find /opt/gbo/logs -name "*.log.*" -mtime +7 -delete +``` + +### Container Tricks & Optimizations + +#### Resource Limits +```bash +# Set CPU limit +sudo incus config set limits.cpu 2 + +# Set memory limit +sudo incus config set limits.memory 4GiB + +# Set disk limit +sudo incus config device set root size 20GiB +``` + +#### Profile Management +```bash +# List profiles +sudo incus profile list + +# Apply profile to container +sudo incus profile add + +# Clone container for testing +sudo incus copy --ephemeral +``` + +#### Network Optimization +```bash +# Add static DHCP-like assignment +sudo incus config device add eth0 nic nictype=bridged parent= + +# Set custom DNS for container +sudo incus config set raw.lxc "lxc.net.0.ipv4.address=" +``` + +#### Quick Container Cloning for Testing +```bash +# Snapshot and clone for safe testing +sudo incus snapshot create test-base +sudo incus copy /test-base -test +sudo incus start -test +# ... test safely ... +sudo incus stop -test +sudo incus delete -test +``` + +--- + +## AutoTask & BASIC Keywords Reference + +### AutoTask System Overview + +AutoTask is an AI-driven task execution system that: + +1. **Analyzes user intent** - "Send email to all customers", "Create weekly report" +2. **Plans execution steps** - Break down into actionable tasks +3. **Generates BASIC scripts** - Using available keywords to accomplish the task +4. **Executes scripts** - Run immediately or schedule for later + +### File Locations + +``` +.gbdrive/ +├── reports/ # Generated reports +├── documents/ # Created documents +├── exports/ # Data exports +└── apps/{appname}/ # HTMX apps (synced to SITES_ROOT) + +.gbdialog/ +├── schedulers/ # Scheduled jobs (cron-based) +├── tools/ # Voice/chat triggered tools +└── handlers/ # Event handlers +``` + +### Complete BASIC Keywords Reference + +#### Data Operations + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `GET` | `GET FROM {table} WHERE {condition}` | Query database records | +| `SET` | `SET {variable} = {value}` | Set variable value | +| `SAVE` | `SAVE {data} TO {table}` | Insert/update database record | +| `FIND` | `FIND {value} IN {table}` | Search for specific value | +| `FIRST` | `FIRST({array})` | Get first element | +| `LAST` | `LAST({array})` | Get last element | +| `FORMAT` | `FORMAT "{template}", var1, var2` | Format string with variables | + +#### Communication + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `SEND MAIL` | `SEND MAIL TO "{email}" WITH subject, body` | Send email | +| `SEND TEMPLATE` | `SEND TEMPLATE "{name}" TO "{email}"` | Send email template | +| `SEND SMS` | `SEND SMS TO "{phone}" MESSAGE "{text}"` | Send SMS | +| `TALK` | `TALK "{message}"` | Respond to user | +| `HEAR` | `HEAR "{phrase}" AS {variable}` | Listen for user input | + +#### File Operations + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `CREATE FILE` | `CREATE FILE "{path}" WITH {content}` | Create file in .gbdrive | +| `READ FILE` | `READ FILE "{path}"` | Read file content | +| `WRITE FILE` | `WRITE FILE "{path}" WITH {content}` | Write to file | +| `DELETE FILE` | `DELETE FILE "{path}"` | Delete file | +| `COPY FILE` | `COPY FILE "{source}" TO "{dest}"` | Copy file | +| `MOVE FILE` | `MOVE FILE "{source}" TO "{dest}"` | Move/rename file | +| `LIST FILES` | `LIST FILES "{path}"` | List directory contents | +| `UPLOAD` | `UPLOAD {data} TO "{path}"` | Upload file | +| `DOWNLOAD` | `DOWNLOAD "{url}" TO "{path}"` | Download file | + +#### HTTP Operations + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `GET HTTP` | `GET HTTP "{url}"` | HTTP GET request | +| `POST HTTP` | `POST HTTP "{url}" WITH {data}` | HTTP POST request | +| `PUT HTTP` | `PUT HTTP "{url}" WITH {data}` | HTTP PUT request | +| `DELETE HTTP` | `DELETE HTTP "{url}"` | HTTP DELETE request | +| `WEBHOOK` | `WEBHOOK "{url}" WITH {data}` | Send webhook | + +#### AI/LLM Operations + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `LLM` | `LLM "{prompt}"` | Call LLM with prompt | +| `USE KB` | `USE KB "{knowledge_base}"` | Use knowledge base for context | +| `CLEAR KB` | `CLEAR KB` | Clear knowledge base context | +| `USE TOOL` | `USE TOOL "{tool_name}"` | Enable external tool | +| `CLEAR TOOLS` | `CLEAR TOOLS` | Disable all tools | +| `USE WEBSITE` | `USE WEBSITE "{url}"` | Scrape website for context | + +#### Task & Scheduling + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `CREATE_TASK` | `CREATE_TASK "{title}", "{assignee}", "{due}", {project}` | Create task | +| `WAIT` | `WAIT {seconds}` | Pause execution | +| `ON` | `ON "{event}" DO {action}` | Event handler | +| `ON EMAIL` | `ON EMAIL FROM "{filter}" DO {action}` | Email trigger | +| `ON CHANGE` | `ON CHANGE {table} DO {action}` | Database change trigger | + +#### Bot & Memory + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `SET BOT MEMORY` | `SET BOT MEMORY "{key}" = {value}` | Store bot-level data | +| `GET BOT MEMORY` | `GET BOT MEMORY "{key}"` | Retrieve bot-level data | +| `REMEMBER` | `REMEMBER "{key}" = {value}` | Store session data | +| `SET CONTEXT` | `SET CONTEXT "{key}" = {value}` | Set conversation context | +| `ADD SUGGESTION` | `ADD SUGGESTION "{text}"` | Add response suggestion | +| `CLEAR SUGGESTIONS` | `CLEAR SUGGESTIONS` | Clear suggestions | + +#### User & Session + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `SET USER` | `SET USER "{property}" = {value}` | Update user property | +| `TRANSFER TO HUMAN` | `TRANSFER TO HUMAN` | Escalate to human agent | +| `ADD_MEMBER` | `ADD_MEMBER "{group}", "{email}", "{role}"` | Add user to group | + +#### Documents & Content + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `CREATE DRAFT` | `CREATE DRAFT "{title}" WITH {content}` | Create document draft | +| `CREATE SITE` | `CREATE SITE "{name}" WITH {config}` | Create website | +| `SAVE FROM UNSTRUCTURED` | `SAVE FROM UNSTRUCTURED {data} TO {table}` | Parse and save data | + +#### Multi-Bot Operations + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `ADD BOT` | `ADD BOT "{name}" WITH TRIGGER "{phrase}"` | Add sub-bot | +| `REMOVE BOT` | `REMOVE BOT "{name}"` | Remove sub-bot | +| `LIST BOTS` | `LIST BOTS` | List active bots | +| `DELEGATE TO` | `DELEGATE TO "{bot}"` | Delegate to another bot | +| `SEND TO BOT` | `SEND TO BOT "{name}" MESSAGE "{msg}"` | Inter-bot message | +| `BROADCAST MESSAGE` | `BROADCAST MESSAGE "{msg}"` | Broadcast to all bots | + +#### Social Media + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `POST TO SOCIAL` | `POST TO SOCIAL "{platform}" MESSAGE "{text}"` | Social media post | +| `GET SOCIAL FEED` | `GET SOCIAL FEED "{platform}"` | Get social feed | + +#### Control Flow + +| Keyword | Syntax | Description | +|---------|--------|-------------| +| `IF/THEN/ELSE/END IF` | `IF condition THEN ... ELSE ... END IF` | Conditional | +| `FOR EACH/NEXT` | `FOR EACH item IN collection ... NEXT` | Loop | +| `SWITCH/CASE/END SWITCH` | `SWITCH var CASE val ... END SWITCH` | Switch statement | +| `PRINT` | `PRINT {value}` | Debug output | + +#### Built-in Variables + +| Variable | Description | +|----------|-------------| +| `TODAY` | Current date | +| `NOW` | Current datetime | +| `USER` | Current user object | +| `SESSION` | Current session object | +| `BOT` | Current bot object | -Fix offline first — read all errors before compiling again. Batch by file — fix all errors in a file at once and write once. Verify last — only run `cargo check` after all fixes are applied. Delete dead code — never keep unused code. Git workflow — always push to all repositories (origin and alm). Target zero warnings and zero errors — loop until clean. \ No newline at end of file diff --git a/PROD.md b/PROD.md index 01a96ad..ce6ed8c 100644 --- a/PROD.md +++ b/PROD.md @@ -1,8 +1,9 @@ -# Production Environment Guide (Compact) +# Production Environment Guide ## CRITICAL RULES — READ FIRST NEVER INCLUDE HERE CREDENTIALS OR COMPANY INFORMATION, THIS IS COMPANY AGNOSTIC. + Always manage services with `systemctl` inside the `system` Incus container. Never run `/opt/gbo/bin/botserver` or `/opt/gbo/bin/botui` directly — they will fail because they won't load the `.env` file containing Vault credentials and paths. The correct commands are `sudo incus exec system -- systemctl start|stop|restart|status botserver` and the same for `ui`. Systemctl handles environment loading, auto-restart, logging, and dependencies. Never push secrets (API keys, passwords, tokens) to git. Never commit `init.json` (it contains Vault unseal keys). All secrets must come from Vault — only `VAULT_*` variables are allowed in `.env`. Never deploy manually via scp or ssh; always use CI/CD. Always push all submodules (botserver, botui, botlib) before or alongside the main repo. Always ask before pushing to ALM. @@ -11,11 +12,294 @@ Never push secrets (API keys, passwords, tokens) to git. Never commit `init.json ## Infrastructure Overview -The host machine is ``, accessed via `ssh user@`, running Incus (an LXD fork) as hypervisor. All services run inside named Incus containers. You enter containers with `sudo incus exec -- ` and list them with `sudo incus list`. +The host machine is accessed via `ssh user@`, running Incus (an LXD fork) as hypervisor. All services run inside named Incus containers. You enter containers with `sudo incus exec -- ` and list them with `sudo incus list`. -The containers and their roles are: `system` runs botserver on port 5858 and botui on port 5859; `alm-ci` runs the Forgejo Actions CI runner; `alm` hosts the Forgejo git server; `tables` runs PostgreSQL on port 5432; `cache` runs Valkey/Redis on port 6379; `drive` runs MinIO object storage on port 9100; `vault` runs HashiCorp Vault on port 8200; `vector` runs Qdrant on port 6333. +### Container Architecture -Externally, botserver is reachable at `https://` and botui at `https://`. Internally, botui's `BOTSERVER_URL` must be `http://localhost:5858` — never the external HTTPS URL, because the Rust proxy runs server-side and needs direct localhost access. +| Container | Service | Technology | Binary Path | Logs Path | Data Path | Notes | +|-----------|---------|------------|-------------|-----------|-----------|-------| +| **system** | BotServer + BotUI | Rust/Axum | `/opt/gbo/bin/botserver`
`/opt/gbo/bin/botui` | `/opt/gbo/logs/out.log`
`/opt/gbo/logs/err.log` | `/opt/gbo/data/`
`/opt/gbo/work/` | Main API + UI proxy | +| **tables** | PostgreSQL | PostgreSQL 15+ | `/usr/lib/postgresql/*/bin/postgres` | `/opt/gbo/logs/postgresql/` | `/opt/gbo/data/pgdata/` | Primary database | +| **vault** | HashiCorp Vault | Vault | `/opt/gbo/bin/vault` | `/opt/gbo/logs/vault/` | `/opt/gbo/data/vault/` | Secrets management | +| **cache** | Valkey | Valkey (Redis fork) | `/opt/gbo/bin/valkey-server` | `/opt/gbo/logs/valkey/` | `/opt/gbo/data/valkey/` | Distributed cache | +| **drive** | MinIO | MinIO | `/opt/gbo/bin/minio` | `/opt/gbo/logs/minio/` | `/opt/gbo/data/minio/` | Object storage (S3 API) | +| **directory** | Zitadel | Zitadel (Go) | `/opt/gbo/bin/zitadel` | `/opt/gbo/logs/zitadel.log` | `PROD-DIRECTORY` DB | Identity provider | +| **llm** | llama.cpp | C++/CUDA | `/opt/gbo/bin/llama-server` | `/opt/gbo/logs/llm/` | `/opt/gbo/models/` | Local LLM inference | +| **vectordb** | Qdrant | Qdrant (Rust) | `/opt/gbo/bin/qdrant` | `/opt/gbo/logs/qdrant/` | `/opt/gbo/data/qdrant/` | Vector database | +| **alm** | Forgejo | Forgejo (Go) | `/opt/gbo/bin/forgejo` | `/opt/gbo/logs/forgejo/` | `/opt/gbo/data/forgejo/` | Git server (port 4747) | +| **alm-ci** | Forgejo Runner | Docker/runner | `/opt/gbo/bin/forgejo-runner` | `/opt/gbo/logs/forgejo-runner.log` | `/opt/gbo/data/ci/` | CI/CD runner | +| **proxy** | Caddy | Caddy | `/opt/gbo/bin/caddy` | `/opt/gbo/logs/caddy/` | `/opt/gbo/conf/` | Reverse proxy | +| **email** | Stalwart | Stalwart (Rust) | `/opt/gbo/bin/stalwart` | `/opt/gbo/logs/email/` | `/opt/gbo/data/email/` | Mail server | +| **webmail** | Roundcube | PHP | `/usr/share/roundcube/` | `/var/log/php/` | `/var/lib/roundcube/` | Webmail frontend | +| **dns** | CoreDNS | CoreDNS (Go) | `/opt/gbo/bin/coredns` | `/opt/gbo/logs/dns/` | `/opt/gbo/conf/Corefile` | DNS resolution | +| **meet** | LiveKit | LiveKit (Go) | `/opt/gbo/bin/livekit-server` | `/opt/gbo/logs/meet/` | `/opt/gbo/data/meet/` | Video conferencing | +| **table-editor** | NocoDB | NocoDB | `/opt/gbo/bin/nocodb` | `/opt/gbo/logs/nocodb/` | `/opt/gbo/data/nocodb/` | Database UI | + +### Network Access + +Externally, services are exposed via reverse proxy (Caddy). Internally, containers communicate via private IPs: + +| Service | External URL | Internal Address | +|---------|--------------|------------------| +| BotServer | `https://` | `http://:8080` | +| BotUI | `https://` | `http://:3000` | +| Zitadel | `https://` | `http://:8080` | +| Forgejo | `https://` | `http://:4747` | +| Webmail | `https://` | `http://:80` | +| Roundcube | `https://` | `http://:80` | + +**Note:** BotUI's `BOTSERVER_URL` must be `http://:8080` internally, NOT the external HTTPS URL. + +--- + +## Daily Operations + +### Daily Health Check (5 minutes) + +Run this every morning or after any deploy: + +```bash +# 1. Container status +sudo incus list + +# 2. Service health - all should show "active (running)" +sudo incus exec system -- systemctl is-active botserver +sudo incus exec system -- systemctl is-active ui +sudo incus exec directory -- systemctl is-active directory 2>/dev/null || echo "Directory check failed" +sudo incus exec drive -- pgrep -f minio > /dev/null && echo "MinIO OK" || echo "MinIO DOWN" +sudo incus exec tables -- pgrep -f postgres > /dev/null && echo "PostgreSQL OK" || echo "PostgreSQL DOWN" + +# 3. IPv4 connectivity check - all containers should have IPv4 +sudo incus list -c n4 | grep -E "(system|tables|vault|directory|drive|cache|llm|vector_db)" | grep -v "10\." && echo "WARNING: Missing IPv4" || echo "IPv4 OK" + +# 4. Application health endpoint +curl -sf https:///api/health && echo "Health OK" || echo "Health FAILED" + +# 5. Recent errors (last 10 lines) +sudo incus exec system -- tail -10 /opt/gbo/logs/err.log | grep -i "error\|panic\|failed" | head -5 +``` + +**Expected Result:** All services "active", all containers have IPv4, health endpoint returns 200, no critical errors. + +### Weekly Deep Check (15 minutes) + +Run every Monday morning: + +```bash +# 1. Disk space on all containers +for c in system tables vault directory drive cache llm vector_db; do + echo "=== $c ===" + sudo incus exec $c -- df -h / 2>/dev/null | tail -1 +done + +# 2. Database connection pool status +sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;" + +# 3. Vault status (should be unsealed) +sudo incus exec vault -- curl -ksf https://localhost:8200/v1/sys/health | grep -q '"sealed":false' && echo "Vault unsealed" || echo "Vault SEALED - CRITICAL" + +# 4. CI runner status +sudo incus exec alm-ci -- pgrep -f forgejo > /dev/null && echo "CI runner OK" || echo "CI runner DOWN" + +# 5. MinIO buckets health +sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:$PATH && mc admin info local' 2>&1 | head -10 + +# 6. Backup verification - check latest snapshot exists +sudo incus snapshot list system | head -5 +``` + +### Quick Status Dashboard + +One-line status of everything: + +```bash +echo "=== GBO Status Dashboard $(date) ===" +echo "Containers:" +sudo incus list -c n4,s | grep -E "(system|tables|vault|directory|drive|cache|llm|vector_db|alm-ci)" | awk '{print $1 ": " $3 " " $4}' +echo "" +echo "Services:" +for svc in botserver ui; do + sudo incus exec system -- systemctl is-active $svc 2>/dev/null && echo " $svc: ACTIVE" || echo " $svc: DOWN" +done +echo "" +echo "Health:" +curl -s -o /dev/null -w "%{http_code}" https:///api/health 2>/dev/null | grep -q "200" && echo " API: OK" || echo " API: FAIL" +``` + +--- + +## Alert Response Playbook + +### Alert: "No IPv4 on container" + +**Symptoms:** Container shows empty IPV4 column in `incus list` + +**Quick Fix:** +```bash +# Identify container +CONTAINER= +IP= # e.g., 10.x.x.x +GATEWAY= + +# Set static IP +sudo incus config device set $CONTAINER eth0 ipv4.address $IP + +# Configure network inside +sudo incus exec $CONTAINER -- bash -c "cat > /etc/network/interfaces << 'EOF' +auto lo +iface lo inet loopback + +auto eth0 +iface eth0 inet static +address $IP +netmask 255.255.255.0 +gateway $GATEWAY +dns-nameservers 8.8.8.8 8.8.4.4 +EOF" + +# Restart +sudo incus restart $CONTAINER + +# Verify +sudo incus exec $CONTAINER -- ip addr show eth0 +``` + +**Prevention:** Always configure static IP when creating new containers. + +--- + +### Alert: "ALM botserver problem" / CI Build Failed + +**Symptoms:** Deploy not working, CI status shows failure + +**Quick Diagnostics:** +```bash +# Check CI database for recent runs +sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 5;"' +# Status codes: 0=pending, 1=success, 2=failure, 3=cancelled, 6=running +``` + +**Quick Fixes:** + +1. **If stuck at status 6 (running):** +```bash +RUN_ID= +sudo incus exec tables -- bash -c "export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c \"UPDATE action_task SET status = 0 WHERE id = $RUN_ID; UPDATE action_run_job SET status = 0 WHERE run_id = $RUN_ID; UPDATE action_run SET status = 0 WHERE id = $RUN_ID;\"" +``` + +2. **If /tmp permission denied:** +```bash +sudo incus exec alm-ci -- chmod 1777 /tmp +sudo incus exec alm-ci -- touch /tmp/build.log && chmod 666 /tmp/build.log +``` + +3. **If CI runner down:** +```bash +sudo incus exec alm-ci -- pkill -9 forgejo +sleep 2 +sudo incus exec alm-ci -- bash -c 'cd /opt/gbo/bin && nohup ./forgejo-runner daemon --config config.yaml >> /opt/gbo/logs/forgejo-runner.log 2>&1 &' +``` + +**After fix:** Push a trivial change to re-trigger CI. + +--- + +### Alert: "Email container stopping reach Internet" + +**Symptoms:** Email notifications failing, container cannot resolve external domains + +**Quick Diagnostics:** +```bash +# Test DNS from email container +sudo incus exec email -- nslookup google.com + +# Check network config +sudo incus exec email -- cat /etc/resolv.conf +sudo incus exec email -- ip route +``` + +**Quick Fixes:** + +1. **If IPv6-only (no IPv4):** Follow "No IPv4 on container" playbook above. + +2. **If DNS not working:** +```bash +# Force Google DNS +sudo incus exec email -- bash -c 'echo "nameserver 8.8.8.8" > /etc/resolv.conf' + +# Or configure via interfaces file +sudo incus exec email -- bash -c "cat > /etc/network/interfaces << 'EOF' +auto lo +iface lo inet loopback + +auto eth0 +iface eth0 inet static +address +netmask 255.255.255.0 +gateway +dns-nameservers 8.8.8.8 8.8.4.4 +EOF" +sudo incus restart email +``` + +3. **If firewall blocking:** Check iptables rules on host for email container IP. + +--- + +### Alert: "Vault sealed" + +**Symptoms:** All services failing, Vault health shows "sealed": true + +**Quick Fix:** +```bash +# Get unseal keys from secure location (not in git!) +KEY1= +KEY2= +KEY3= + +sudo incus exec vault -- vault operator unseal $KEY1 +sudo incus exec vault -- vault operator unseal $KEY2 +sudo incus exec vault -- vault operator unseal $KEY3 + +# Verify +sudo incus exec vault -- vault status +``` + +--- + +### Alert: "Botserver not responding" + +**Quick Diagnostics:** +```bash +# Check process +sudo incus exec system -- pgrep -f botserver || echo "NOT RUNNING" + +# Check systemd status +sudo incus exec system -- systemctl status botserver --no-pager + +# Check recent logs +sudo incus exec system -- tail -20 /opt/gbo/logs/err.log + +# Check for GLIBC errors +sudo incus exec system -- ldd /opt/gbo/bin/botserver | grep "not found" +``` + +**Quick Fixes:** + +1. **If systemd failed:** +```bash +sudo incus exec system -- systemctl restart botserver +sudo incus exec system -- systemctl restart ui +``` + +2. **If GLIBC mismatch:** Binary compiled with wrong glibc. Must rebuild inside system container (Debian 12, glibc 2.36). + +3. **If port conflict:** +```bash +sudo incus exec system -- lsof -i :8080 +sudo incus exec system -- killall botserver +sudo incus exec system -- systemctl start botserver +``` --- @@ -29,10 +313,12 @@ Internally, Zitadel listens on port 8080 within the directory container. For ext - Via public domain (HTTPS): `https://` (configured through proxy container) - Via host IP (HTTP): `http://:9000` (direct container port forwarding) - Via container IP (HTTP): `http://:9000` (direct container access) + Access the Zitadel console at `https:///ui/console` with admin credentials. Zitadel implements v1 Management API (deprecated) and v2 Organization/User services. Always use the v2 endpoints under `/v2/organizations` and `/v2/users` for all operations. The botserver bootstrap also manages: Vault (secrets), PostgreSQL (database), Valkey (cache, password auth), MinIO (object storage), Zitadel (identity provider), and llama.cpp (LLM). -To obtain a PAT for Zitadel API access, check /opt/gbo/conf/directory/admin-pat.txt in the directory container. Use it with curl by setting the Authorization header: `Authorization: Bearer $(cat /opt/gbo/conf/directory/admin-pat.txt)` and include `-H "Host: "` for correct host resolution (replace with your directory container IP). + +To obtain a PAT for Zitadel API access, check /opt/gbo/conf/directory/admin-pat.txt in the directory container. Use it with curl by setting the Authorization header: `Authorization: Bearer $(cat /opt/gbo/conf/directory/admin-pat.txt)` and include `-H "Host: "` for correct host resolution. --- @@ -50,42 +336,42 @@ PAT=$(ssh administrator@ "sudo incus exec directory -- cat /opt/gbo/co **Create a Human User:** ```bash curl -X POST "http://:8080/v2/users/human" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: " \ - -d '{ - "username": "testuser", - "profile": {"givenName": "Test", "familyName": "User"}, - "email": {"email": "test@example.com", "isVerified": true}, - "password": {"password": "SecurePass123!", "changeRequired": false} - }' +-H "Content-Type: application/json" \ +-H "Authorization: Bearer $PAT" \ +-H "Host: " \ +-d '{ + "username": "testuser", + "profile": {"givenName": "Test", "familyName": "User"}, + "email": {"email": "test@example.com", "isVerified": true}, + "password": {"password": "", "changeRequired": false} +}' ``` **List Users:** ```bash curl -X POST "http://:8080/v2/users" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: " \ - -d '{"query": {"offset": 0, "limit": 100}}' +-H "Content-Type: application/json" \ +-H "Authorization: Bearer $PAT" \ +-H "Host: " \ +-d '{"query": {"offset": 0, "limit": 100}}' ``` **Update User Password:** ```bash curl -X POST "http://:8080/v2/users//password" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: " \ - -d '{ - "newPassword": {"password": "NewPass123!", "changeRequired": false} - }' +-H "Content-Type: application/json" \ +-H "Authorization: Bearer $PAT" \ +-H "Host: " \ +-d '{ + "newPassword": {"password": "", "changeRequired": false} +}' ``` **Delete User:** ```bash curl -X DELETE "http://:8080/v2/users/" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: " +-H "Authorization: Bearer $PAT" \ +-H "Host: " ``` ### Directory Quick Reference @@ -99,368 +385,6 @@ curl -X DELETE "http://:8080/v2/users/" \ | List users | `POST /v2/users` | | Update password | `POST /v2/users/{id}/password` | -### Zitadel API v2 Usage with PAT - -**Important:** Zitadel API v2 requires a valid Personal Access Token (PAT) for authentication. The PAT must have the appropriate scopes for the operations you want to perform. - -**Using PAT with curl:** - -```bash -# Set your PAT as an environment variable -PAT="" - -# Include the required headers in all API calls -curl -X POST "http://:8080/v2/organizations" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: " \ - -d '{ - "name": "pragmatismo" - }' -``` - -**Critical Headers:** -- `Authorization: Bearer $PAT` - Your PAT token -- `Host: ` - Required for gRPC-gateway routing -- `Content-Type: application/json` - For POST/PUT/PATCH requests - -**Common API v2 Endpoints:** - -Create Organization: -```bash -curl -X POST "http://10.157.134.240:8080/v2/organizations" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: 10.157.134.240" \ - -d '{ - "name": "organization-name" - }' -``` - -List Organizations (requires body with query): -```bash -curl -X POST "http://10.157.134.240:8080/v2/organizations" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: 10.157.134.240" \ - -d '{ - "query": { - "offset": 0, - "limit": 100 - } - }' -``` - -Create Human User: -```bash -curl -X POST "http://10.157.134.240:8080/v2/users/human" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: 10.157.134.240" \ - -d '{ - "username": "johndoe", - "profile": { - "givenName": "John", - "familyName": "Doe" - }, - "email": { - "email": "john@example.com", - "isVerified": true - }, - "password": { - "password": "SecurePass123!", - "changeRequired": false - } - }' -``` - -**Testing PAT Validity:** -```bash -# Test if PAT is valid by calling users endpoint -curl -X POST "http://10.157.134.240:8080/v2/users" \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer $PAT" \ - -H "Host: 10.157.134.240" \ - -d '{"query": {"offset": 0, "limit": 1}}' - -# If you get {"code":16,"message":"Errors.Token.Invalid (AUTH-7fs1e)"}, the PAT is invalid -``` - -**Generating a New PAT via Web Console:** -1. Access: `http://:9000/ui/console` -2. Login with admin credentials -3. Navigate to your profile (top right corner) -4. Go to "Personal Access Tokens" -5. Click "Create" -6. Name the token and select expiration -7. Copy the token (you won't see it again!) -8. Update `/opt/gbo/conf/directory/admin-pat.txt` with the new token - -### Production Credentials - -**Admin Account:** -- Username: `admin` -- Password: `Admin123!` -- Access: `https:///ui/console` - -**Test User Account (created via API):** -- Username: `rodriguez` -- Password: `SecurePass2026!` -- User ID: `368981346720188144` -- Access: Use with any bot login page - ---- - -### Zitadel Setup & Initialization - -**Database Configuration:** -Zitadel connects to PostgreSQL with these credentials (set in `directory.service`): -- Database: `PROD-DIRECTORY` -- Host: `10.157.134.174` (tables container) -- Port: `5432` -- User: `postgres` -- Password: `67a690df` (from Vault: `secret/gbo/tables`) - -**Current Production Settings:** -- Container IP: `10.157.134.240` -- Internal port: `8080` -- External port: `9000` -- Masterkey: `MasterkeyNeedsToHave32Characters` (CHANGE THIS IN PRODUCTION!) -- TLS mode: `disabled` -- External domain: `10.157.134.240` - -**Initialization File:** -Location: `/opt/gbo/conf/directory/zitadel-init-steps.yaml` -```yaml -FirstInstance: - InstanceName: "BotServer" - DefaultLanguage: "en" - PatPath: "/opt/gbo/conf/directory/admin-pat.txt" - Org: - Name: "BotServer" - Machine: - Machine: - Username: "admin-sa" - Name: "Admin Service Account" - Pat: - ExpirationDate: "2099-01-01T00:00:00Z" - Human: - UserName: "admin" - FirstName: "Admin" - LastName: "User" - Email: - Address: "admin@localhost" - Verified: true - Password: "Admin123!" - PasswordChangeRequired: false -``` - -**To Reinitialize Zitadel (if database is empty or corrupted):** -```bash -# 1. Stop the service -sudo incus exec directory -- systemctl stop directory - -# 2. Drop and recreate the database -sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "DROP DATABASE IF EXISTS \"PROD-DIRECTORY\";" -sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "CREATE DATABASE \"PROD-DIRECTORY\";" - -# 3. Run initialization -sudo incus exec directory -- bash -c ' - export ZITADEL_DATABASE_POSTGRES_HOST=10.157.134.174 - export ZITADEL_DATABASE_POSTGRES_PORT=5432 - export ZITADEL_DATABASE_POSTGRES_DATABASE=PROD-DIRECTORY - export ZITADEL_DATABASE_POSTGRES_USER_USERNAME=postgres - export ZITADEL_DATABASE_POSTGRES_USER_PASSWORD=67a690df - export ZITADEL_DATABASE_POSTGRES_USER_SSL_MODE=disable - /opt/gbo/bin/zitadel setup init \ - --config /opt/gbo/conf/directory/zitadel-init-steps.yaml \ - --masterkey MasterkeyNeedsToHave32Characters \ - --tlsMode disabled -' - -# 4. Start the service -sudo incus exec directory -- systemctl start directory - -# 5. Verify health -curl -sf http://10.157.134.240:8080/debug/healthz -``` - -**Zitadel Database Schema:** -The database uses multiple schemas: -- `system` - System tables and configuration -- `projections` - Read-only projection tables (orgs, users, sessions, etc.) -- `eventstore` - Event sourcing tables -- `adminapi`, `auth`, `logstore`, `cache`, `queue` - Specialized schemas - -To query organizations: -```bash -sudo incus exec tables -- psql -h localhost -U postgres -d PROD-DIRECTORY -c \ - "SELECT id, name FROM projections.orgs1;" -``` - ---- - -### Zitadel Troubleshooting - -**Database Connection Errors:** -If logs show `failed SASL auth: FATAL: password authentication failed for user "postgres"`: -```bash -# Check systemd unit has correct credentials -sudo incus exec directory -- cat /etc/systemd/system/directory.service - -# Verify Vault has the correct credentials -TOKEN="${VAULT_TOKEN}" -sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt \ - -H "X-Vault-Token: $TOKEN" \ - https://10.157.134.250:8200/v1/secret/data/gbo/tables - -# If credentials changed, update systemd unit and restart -sudo incus exec directory -- systemctl daemon-reload -sudo incus exec directory -- systemctl restart directory -``` - -**Empty Database (No Organizations):** -If the database was initialized but tables are missing: -```bash -# Check if tables exist -sudo incus exec tables -- psql -h localhost -U postgres -d PROD-DIRECTORY -c \ - "SELECT tablename FROM pg_tables WHERE schemaname = 'projections' LIMIT 5;" - -# If no tables, reinitialize using the steps above -``` - -**PAT Token Invalid:** -If API calls return `Errors.Token.Invalid (AUTH-7fs1e)`: -```bash -# Check if PAT file exists -sudo incus exec directory -- cat /opt/gbo/conf/directory/admin-pat.txt - -# If missing or expired, regenerate via console or API: -# 1. Login to console: http://:9000/ui/console -# 2. Go to Profile → Personal Access Tokens → Create -# 3. Save the new token to admin-pat.txt -``` - -**Health Check Fails:** -```bash -# Check service status -sudo incus exec directory -- systemctl status directory - -# Check logs -sudo incus exec directory -- tail -50 /opt/gbo/logs/stderr.log -sudo incus exec directory -- tail -50 /opt/gbo/logs/stdout.log - -# Verify database connectivity -sudo incus exec directory -- pg_isready -h 10.157.134.174 -p 5432 -U postgres -``` - -**Migration Errors:** -If migrations fail or database is in bad state: -```bash -# Stop service -sudo incus exec directory -- systemctl stop directory - -# Drop and recreate database -sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "DROP DATABASE IF EXISTS \"PROD-DIRECTORY\";" -sudo incus exec tables -- psql -h localhost -U postgres -d postgres -c "CREATE DATABASE \"PROD-DIRECTORY\";" - -# Reinitialize (see initialization steps above) -``` - -**Systemd Unit Configuration:** -The `directory.service` unit contains all environment variables: -```ini -[Unit] -Description=Directory (Zitadel) -After=network.target - -[Service] -User=root -Group=root -WorkingDirectory=/opt/gbo -Environment=ZITADEL_DATABASE_POSTGRES_HOST=10.157.134.174 -Environment=ZITADEL_DATABASE_POSTGRES_PORT=5432 -Environment=ZITADEL_DATABASE_POSTGRES_DATABASE=PROD-DIRECTORY -Environment=ZITADEL_DATABASE_POSTGRES_USER_USERNAME=postgres -Environment=ZITADEL_DATABASE_POSTGRES_USER_PASSWORD=67a690df -Environment=ZITADEL_DATABASE_POSTGRES_USER_SSL_MODE=disable -Environment=ZITADEL_EXTERNALSECURE=false -Environment=ZITADEL_EXTERNALDOMAIN=10.157.134.240 -Environment=ZITADEL_EXTERNALPORT=9000 -Environment=ZITADEL_TLS_ENABLED=false -ExecStart=/opt/gbo/bin/zitadel start --masterkey MasterkeyNeedsToHave32Characters --tlsMode disabled --externalDomain 10.157.134.240 --externalPort 9000 -Restart=always -RestartSec=5 -StandardOutput=append:/opt/gbo/logs/stdout.log -StandardError=append:/opt/gbo/logs/stderr.log - -[Install] -WantedBy=multi-user.target -``` - ---- - -## Common Operations - -**Check status:** `sudo incus exec system -- systemctl status botserver --no-pager` (same for `ui`). To check process existence: `sudo incus exec system -- pgrep -f botserver`. - -**View logs:** For systemd journal: `sudo incus exec system -- journalctl -u botserver --no-pager -n 50`. For application logs: `sudo incus exec system -- tail -50 /opt/gbo/logs/out.log` or `err.log`. For live tail: `sudo incus exec system -- tail -f /opt/gbo/logs/out.log`. - -**Restart:** `sudo incus exec system -- systemctl restart botserver` and same for `ui`. Never run the binary directly. - -**Emergency manual deploy:** Kill the old process with `sudo incus exec system -- killall botserver`, copy the new binary from `/opt/gbo/ci/botserver/target/debug/botserver` to `/opt/gbo/bin/botserver`, set permissions with `chmod +x` and `chown gbuser:gbuser`, then start with `systemctl start botserver`. - -**Transfer bot files:** Archive locally with `tar czf /tmp/bots.tar.gz -C /opt/gbo/data .gbai`, copy to host with `scp`, then extract inside container with `sudo incus exec system -- bash -c 'tar xzf /tmp/bots.tar.gz -C /opt/gbo/data/'`. Clear compiled cache with `find /opt/gbo/data -name "*.ast" -delete` and same for `/opt/gbo/work`. - -**Snapshots:** `sudo incus snapshot list system` to list, `sudo incus snapshot restore system ` to restore. - ---- - -## CI/CD Pipeline - -Repositories exist on both GitHub and the internal ALM (Forgejo). The four repos are `gb` (main workspace), `botserver`, `botui`, and `botlib`. Always push submodules first (`cd botserver && git push alm main && git push origin main`), then update submodule references in the root repo and push that too. - -The CI runner container (`alm-ci`) runs Debian 12 Bookworm with glibc 2.36, same as the `system` container. Binaries compiled on the CI runner are compatible with the system container. The CI workflow (`botserver/.forgejo/workflows/botserver.yaml`) builds in alm-ci (which has Rust toolchain) and deploys binary to system container. The workflow triggers on pushes to `main`, clones repos, builds in alm-ci, transfers binary via scp, and verifies botserver is running. - -### ALM/CI Debugging & Monitoring - -**Access ALM/CI containers:** -```bash -ssh administrator@ -sudo incus exec alm-ci -- bash # CI runner container -sudo incus exec tables -- bash # PostgreSQL (ALM database) -sudo incus exec system -- bash # botserver container -``` - -**Check CI runner status:** -```bash -# Runner process -sudo incus exec alm-ci -- ps aux | grep forgejo - -# Runner logs -sudo incus exec alm-ci -- cat /opt/gbo/logs/forgejo-runner.log - -# If runner is down, restart: -sudo incus exec alm-ci -- pkill -9 forgejo; sleep 2; cd /opt/gbo/bin && nohup ./forgejo-runner daemon --config config.yaml >> /opt/gbo/logs/forgejo-runner.log 2>&1 & -``` - -**Monitor CI runs in database:** -```bash -# Status codes: 0=pending, 1=success, 2=failure, 3=cancelled, 6=running -sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, commit_sha, created FROM action_run ORDER BY id DESC LIMIT 5;"' - -# Check specific run jobs -sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, name FROM action_run_job WHERE run_id = ;"' - -# Check tasks -sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status FROM action_task WHERE repo_id = 3 ORDER BY id DESC LIMIT 3;"' - -# Reset stuck run to re-trigger -sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c "UPDATE action_task SET status = 0 WHERE id = ; UPDATE action_run_job SET status = 0 WHERE id = ; UPDATE action_run SET status = 0 WHERE id = ;"' -``` - -**Fix common CI issues:** -```bash # /tmp permission denied for build.log sudo incus exec alm-ci -- chmod 1777 /tmp sudo incus exec alm-ci -- touch /tmp/build.log && chmod 666 /tmp/build.log @@ -468,10 +392,6 @@ sudo incus exec alm-ci -- touch /tmp/build.log && chmod 666 /tmp/build.log # Clean old CI runs (keep recent) sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run WHERE id < ;"' sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c "DELETE FROM action_run_job WHERE run_id < ;"' - -# Check deploy.log missing error - fix workflow step -# The Save deploy log step expects /tmp/deploy.log which the workflow doesn't create -# Fix: ensure deploy step outputs to /tmp/deploy.log ``` **Watch CI in real-time:** @@ -480,24 +400,13 @@ sudo incus exec tables -- bash -c 'export PGPASSWORD=; psql - sudo incus exec alm-ci -- tail -f /opt/gbo/logs/forgejo-runner.log # Check if new builds appear -watch -n 5 'sudo incus exec tables -- bash -c "export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c \"SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 3;\""' +watch -n 5 'sudo incus exec tables -- bash -c "export PGPASSWORD=; psql -h localhost -U postgres -d PROD-ALM -c \\"SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 3;\\""' # Verify botserver deployed correctly sudo incus exec system -- /opt/gbo/bin/botserver --version 2>&1 | head -3 sudo incus exec system -- tail -5 /opt/gbo/logs/err.log ``` -**CI Workflow Structure:** -1. Setup Git (disable SSL verify, add safe directories) -2. Setup Workspace (clone/merge gb workspace Cargo.toml) -3. Install system dependencies -4. Clean up workspaces -5. Build BotServer (output to /tmp/build.log) -6. Save build log -7. Deploy via ssh tar gzip -8. Verify botserver started -9. Save deploy log - --- ## DriveMonitor & Bot Configuration @@ -520,7 +429,7 @@ The `config.csv` format is a plain CSV with no header: each line is `key,value`, All bot files live in MinIO buckets. Use the `mc` CLI at `/opt/gbo/bin/mc` from inside the `drive` container. The bucket structure per bot is: `{bot}.gbai/` as root, `{bot}.gbai/{bot}.gbdialog/` for BASIC scripts, `{bot}.gbai/{bot}.gbot/` for config.csv, and `{bot}.gbai/{bot}.gbkb/` for knowledge base folders. -Common mc commands: `mc ls local/` lists all buckets; `mc ls local/salesianos.gbai/` lists a bucket; `mc cat local/.../start.bas` prints a file; `mc cp local/.../file /tmp/file` downloads; `mc cp /tmp/file local/.../file` uploads (this triggers DriveMonitor recompile); `mc stat local/.../config.csv` shows ETag and metadata; `mc mb local/newbot.gbai` creates a bucket; `mc rb local/oldbot.gbai` removes an empty bucket. +Common mc commands: `mc ls local/` lists all buckets; `mc ls local/botname.gbai/` lists a bucket; `mc cat local/.../start.bas` prints a file; `mc cp local/.../file /tmp/file` downloads; `mc cp /tmp/file local/.../file` uploads (this triggers DriveMonitor recompile); `mc stat local/.../config.csv` shows ETag and metadata; `mc mb local/newbot.gbai` creates a bucket; `mc rb local/oldbot.gbai` removes an empty bucket. If mc is not found, use the full path `/opt/gbo/bin/mc`. If alias `local` is not configured, check with `mc config host list`. If MinIO is not running, check with `sudo incus exec drive -- systemctl status minio`. @@ -540,8 +449,10 @@ HashiCorp Vault is the single source of truth for all secrets. Botserver reads ` **Vault troubleshooting — cannot connect:** Check that the vault container's systemd unit is running, verify the token in `.env` is not expired with `vault token lookup`, confirm the CA cert path in `.env` matches the actual file location, and test network connectivity from system to vault container. To generate a new token: `vault token create -policy="botserver" -ttl="8760h" -format=json` then update `.env` and restart botserver. -# Get database credentials from Vault v2 API -$ ssh user@ "sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt -H 'X-Vault-Token: ' https://:8200/v1/secret/data/gbo/tables 2>/dev/null" +**Get database credentials from Vault v2 API:** +```bash +ssh user@ "sudo incus exec system -- curl -s --cacert /opt/gbo/conf/system/certificates/ca/ca.crt -H 'X-Vault-Token: ' https://:8200/v1/secret/data/gbo/tables 2>/dev/null" +``` **Vault troubleshooting — secrets missing:** Run `vault kv get secret/gbo/tables` (and other paths) to check if secrets exist. If a path returns NOT FOUND, add secrets with `vault kv put secret/gbo/tables host= port=5432 database=botserver username=gbuser password=` and similar for other paths. @@ -553,102 +464,6 @@ $ ssh user@ "sudo incus exec system -- curl -s --cacert /opt/gbo/conf/ --- -## Incus Container Network Configuration - -### Static IPv4 Address Assignment - -When creating new containers, they may not receive IPv4 addresses automatically. To assign permanent static IPs: - -**Step 1: Set static IP on the container device** -```bash -# Choose an unused IP in the 10.157.134.x range -sudo incus config device set eth0 ipv4.address 10.157.134. -``` - -**Step 2: Configure network inside the container** -```bash -sudo incus exec -- bash -c 'cat > /etc/network/interfaces << EOF -auto lo -iface lo inet loopback - -auto eth0 -iface eth0 inet static - address 10.157.134. - netmask 255.255.255.0 - gateway 10.157.134.1 - dns-nameservers 8.8.8.8 8.8.4.4 -EOF' -``` - -**Step 3: Restart the container** -```bash -sudo incus restart -``` - -**Step 4: Verify IPv4 assignment** -```bash -sudo incus list -c n4 -sudo incus exec -- ip addr show eth0 -``` - -### Common Network Issues - -| Problem | Symptom | Fix | -|---------|---------|-----| -| No IPv4 | Container shows empty IPV4 column | Set static IP via `incus config device set` | -| IP conflict | "IP address already defined on another NIC" | Choose different IP, check `incus list` | -| Can't reach internet | DNS fails inside container | Configure DNS in `/etc/network/interfaces` | -| IPv6 only | Has IPv6 but no IPv4 | Add static IPv4 config as above | -| DHCP not working | dhclient fails or returns 169.254.x.x | Use static IP assignment instead | - -### Container IP Reference - -Standard IP assignments (10.157.134.x range): -- `system`: 10.157.134.196 -- `tables`: 10.157.134.174 -- `vault`: 10.157.134.250 -- `cache`: 10.157.134.230 -- `drive`: 10.157.134.206 -- `directory`: 10.157.134.240 -- `llm`: 10.157.134.205 -- `vectordb`: 10.157.134.210 -- `models`: 10.157.134.251 (reserved) -- `dns`: 10.157.134.214 -- `proxy`: 10.157.134.241 -- `email`: 10.157.134.40 -- `meet`: 10.157.134.220 - -### Creating a New Container with Static IP - -```bash -# Create container -sudo incus launch images:debian/12 - -# Set static IP (before first boot is best) -sudo incus config device set eth0 ipv4.address 10.157.134. - -# Configure networking inside container -sudo incus exec -- bash -c 'cat > /etc/network/interfaces << EOF -auto lo -iface lo inet loopback - -auto eth0 -iface eth0 inet static - address 10.157.134. - netmask 255.255.255.0 - gateway 10.157.134.1 - dns-nameservers 8.8.8.8 -EOF' - -# Restart to apply -sudo incus restart - -# Verify -sudo incus list -``` - ---- - ## Troubleshooting Quick Reference **GLIBC mismatch (`GLIBC_2.39 not found`):** The binary was compiled on the CI runner (glibc 2.41) not inside the system container (glibc 2.36). The CI workflow must SSH into the system container to build. Check `botserver.yaml` to confirm this. @@ -682,7 +497,7 @@ All `mc` commands run inside the `drive` container with `PATH` set: `sudo incus **Full upload workflow example — updating config.csv:** ```bash # 1. Download current config from Drive -ssh user@host "sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:\$PATH && mc cat local/salesianos.gbai/salesianos.gbot/config.csv'" > /tmp/config.csv +ssh user@host "sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:\$PATH && mc cat local/botname.gbai/botname.gbot/config.csv'" > /tmp/config.csv # 2. Edit locally (change model, keys, etc.) sed -i 's/llm-model,old-model/llm-model,new-model/' /tmp/config.csv @@ -690,7 +505,7 @@ sed -i 's/llm-model,old-model/llm-model,new-model/' /tmp/config.csv # 3. Push edited file back to Drive scp /tmp/config.csv user@host:/tmp/config.csv ssh user@host "sudo incus file push /tmp/config.csv drive/tmp/config.csv" -ssh user@host "sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:\$PATH && mc put /tmp/config.csv local/salesianos.gbai/salesianos.gbot/config.csv'" +ssh user@host "sudo incus exec drive -- bash -c 'export PATH=/opt/gbo/bin:\$PATH && mc put /tmp/config.csv local/botname.gbai/botname.gbot/config.csv'" # 4. Wait ~15 seconds, then verify DriveMonitor picked up the change ssh user@host "sudo incus exec system -- bash -c 'grep -i \"Model:\" /opt/gbo/logs/err.log | tail -3'" @@ -708,7 +523,7 @@ ssh user@host "sudo incus exec system -- bash -c 'grep -i \"Model:\" /opt/gbo/lo **Application logs** (searchable, timestamped, most useful): `sudo incus exec system -- tail -f /opt/gbo/logs/err.log` (errors and debug) or `/opt/gbo/logs/out.log` (stdout). The systemd journal only captures process lifecycle events, not application output. -**Search logs for specific bot activity:** `grep -i "salesianos\|llm\|Model:\|KB\|USE_KB\|drive_monitor" /opt/gbo/logs/err.log | tail -30` +**Search logs for specific bot activity:** `grep -i "botname\|llm\|Model:\|KB\|USE_KB\|drive_monitor" /opt/gbo/logs/err.log | tail -30` **Check which LLM model a bot is using:** `grep "Model:" /opt/gbo/logs/err.log | tail -5` @@ -716,7 +531,7 @@ ssh user@host "sudo incus exec system -- bash -c 'grep -i \"Model:\" /opt/gbo/lo **Check KB/vector operations:** `grep -i "gbkb\|qdrant\|embedding\|index" /opt/gbo/logs/err.log | tail -20` -**Live tail with filter:** `sudo incus exec system -- bash -c 'tail -f /opt/gbo/logs/err.log | grep --line-buffered -i "salesianos\|error\|KB"'` +**Live tail with filter:** `sudo incus exec system -- bash -c 'tail -f /opt/gbo/logs/err.log | grep --line-buffered -i "botname\|error\|KB"'` --- @@ -731,7 +546,7 @@ ssh user@host "sudo incus exec system -- bash -c 'grep -i \"Model:\" /opt/gbo/lo | vault | vault | `/opt/gbo/bin/vault` | Needs `VAULT_ADDR`, `VAULT_TOKEN`, `VAULT_CACERT` | | zitadel | directory | `/opt/gbo/bin/zitadel` | Runs as root on port 8080 internally | -**Quick psql query — bot config:** `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = 'salesianos') ORDER BY config_key;"` +**Quick psql query — bot config:** `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT config_key, config_value FROM bot_configuration WHERE bot_id = (SELECT id FROM bots WHERE name = 'botname') ORDER BY config_key;"` **Quick psql query — active KBs for session:** `sudo incus exec tables -- psql -h localhost -U postgres -d botserver -c "SELECT * FROM session_kb_associations WHERE session_id = '' AND is_active = true;"` @@ -746,3 +561,291 @@ The `.ast` file has all transforms applied: `USE KB "cartas"` becomes `USE_KB("c **Tools (TOOL_EXEC) load `.ast` only** — there is no `.bas` fallback. If an `.ast` file is missing, the tool fails with "Failed to read tool .ast file". DriveMonitor must have compiled it first. **Suggestion deduplication** uses Redis `SADD` (set) instead of `RPUSH` (list). This prevents duplicate suggestion buttons when `start.bas` runs multiple times per session. The key format is `suggestions:{bot_id}:{session_id}` and `get_suggestions` uses `SMEMBERS` to read it. + +--- + +## Container Quick Reference + +| Container | Critical | Check Command | Restart Command | +|-----------|----------|---------------|-----------------| +| system | YES | `systemctl is-active botserver` | `systemctl restart botserver` | +| tables | YES | `pgrep -f postgres` | `systemctl restart postgresql` | +| vault | YES | `curl -ksf https://localhost:8200/v1/sys/health` | `systemctl restart vault` | +| drive | YES | `pgrep -f minio` | `systemctl restart minio` | +| cache | HIGH | `pgrep -f valkey` | `systemctl restart valkey` | +| directory | HIGH | `curl -sf http://localhost:8080/debug/healthz` | `systemctl restart directory` | +| alm-ci | MED | `pgrep -f forgejo` | manual restart | +| llm | MED | `curl -sf http://localhost:8081/health` | `systemctl restart llm` | +| vector_db | LOW | `curl -sf http://localhost:6333/healthz` | `systemctl restart qdrant` | + +--- + +## Log Tailing Commands + +```bash +# Live error monitoring +sudo incus exec system -- tail -f /opt/gbo/logs/err.log | grep -i "error\|panic\|failed" + +# Bot-specific activity +sudo incus exec system -- tail -f /opt/gbo/logs/err.log | grep -i "" + +# DriveMonitor activity +sudo incus exec system -- tail -f /opt/gbo/logs/err.log | grep -i "drive\|config" + +# LLM calls +sudo incus exec system -- tail -f /opt/gbo/logs/err.log | grep -i "model\|llm\|groq" + +# CI runner +sudo incus exec alm-ci -- tail -f /opt/gbo/logs/forgejo-runner.log +``` + +--- + +## Health Endpoint Monitoring + +Set up a simple cron job to alert if health fails: + +```bash +# Add to host crontab (crontab -e) +*/5 * * * * curl -sf https:///api/health || echo "ALERT: Health check failed at $(date)" >> /var/log/gbo-health.log +``` + +--- + +## Troubleshooting Quick Reference + +### Container Won't Start (No IPv4) + +**Symptom:** Container shows empty IPV4 column in `sudo incus list` + +**Diagnose:** +```bash +sudo incus list -c n4 +sudo incus exec -- ip addr show eth0 +``` + +**Fix:** +```bash +# 1. Stop container +sudo incus stop + +# 2. Set static IP +sudo incus config device set eth0 ipv4.address + +# 3. Configure network inside +sudo incus exec -- bash -c 'cat > /etc/network/interfaces << EOF +auto lo +iface lo inet loopback + +auto eth0 +iface eth0 inet static +address +netmask 255.255.255.0 +gateway +dns-nameservers 8.8.8.8 8.8.4.4 +EOF' + +# 4. Restart +sudo incus restart + +# 5. Verify +sudo incus exec -- ip addr show eth0 +``` + +--- + +### CI/ALM Permission Errors + +**Symptom:** `/tmp permission denied` during CI build + +**Fix:** +```bash +# On alm-ci container +sudo incus exec alm-ci -- chmod 1777 /tmp +sudo incus exec alm-ci -- touch /tmp/build.log && chmod 666 /tmp/build.log + +# Check runner user +sudo incus exec alm-ci -- ls -la /opt/gbo/ + +# Fix ownership +sudo incus exec alm-ci -- chown -R gbuser:gbuser /opt/gbo/bin/ +sudo incus exec alm-ci -- chown -R gbuser:gbuser /opt/gbo/data/ +``` + +**CI Runner Down:** +```bash +sudo incus exec alm-ci -- pkill -9 forgejo +sleep 2 +sudo incus exec alm-ci -- bash -c 'cd /opt/gbo/bin && nohup ./forgejo-runner daemon --config config.yaml >> /opt/gbo/logs/forgejo-runner.log 2>&1 &' +``` + +--- + +### MinIO (Drive) Operations with `mc` + +**Setup:** +```bash +# Access drive container +sudo incus exec drive -- bash + +# Set PATH +export PATH=/opt/gbo/bin:$PATH + +# Verify mc works +mc --version +``` + +**Common Commands:** +```bash +# List all buckets +mc ls local/ + +# List bot bucket +mc ls local/.gbai/ + +# Read start.bas +mc cat local/.gbai/.gbdialog/start.bas + +# Download file +mc cp local/.gbai/.gbdialog/config.csv /tmp/config.csv + +# Upload file (triggers DriveMonitor) +mc cp /tmp/config.csv local/.gbai/.gbot/config.csv + +# Force re-sync (change ETag) +mc cp local/.gbai/.gbot/config.csv local/.gbai/.gbot/config.csv + +# Create new bucket +mc mb local/newbot.gbai + +# Check MinIO health +mc admin info local +``` + +**If `local` alias missing:** +```bash +# Create alias +mc alias set local http://localhost:9000 +``` + +--- + +### Forgejo ALM Database Operations + +**Access ALM database (PROD-ALM):** +```bash +# On tables container +sudo incus exec tables -- psql -h localhost -U postgres -d PROD-ALM +``` + +**Common Queries:** +```sql +-- Check CI runs +SELECT id, status, commit_sha, created FROM action_run ORDER BY id DESC LIMIT 10; + +-- Status codes: 0=pending, 1=success, 2=failure, 3=cancelled, 6=running + +-- Check specific run jobs +SELECT id, status, name FROM action_run_job WHERE run_id = ; + +-- Reset stuck run +UPDATE action_task SET status = 0 WHERE id = ; +UPDATE action_run_job SET status = 0 WHERE run_id = ; +UPDATE action_run SET status = 0 WHERE id = ; + +-- Check runner token +SELECT * FROM action_runner_token; + +-- List runners +SELECT * FROM action_runner; +``` + +**Check CI from host:** +```bash +export PGPASSWORD= +sudo incus exec tables -- psql -h localhost -U postgres -d PROD-ALM -c "SELECT id, status, created FROM action_run ORDER BY id DESC LIMIT 5;" +``` + +--- + +### Zitadel API v2 Operations + +**Important:** Always use **v2 API** - v1 is deprecated and non-functional. + +**Get PAT:** +```bash +PAT=$(sudo incus exec directory -- cat /opt/gbo/conf/directory/admin-pat.txt) +``` + +**Common Operations:** + +**Create User (v2):** +```bash +curl -X POST "http://:8080/v2/users/human" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $PAT" \ + -H "Host: " \ + -d '{ + "username": "newuser", + "profile": {"givenName": "New", "familyName": "User"}, + "email": {"email": "user@example.com", "isVerified": true}, + "password": {"password": "", "changeRequired": false} + }' +``` + +**List Users (v2):** +```bash +curl -X POST "http://:8080/v2/users" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $PAT" \ + -H "Host: " \ + -d '{"query": {"offset": 0, "limit": 100}}' +``` + +**Create Organization (v2):** +```bash +curl -X POST "http://:8080/v2/organizations" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $PAT" \ + -H "Host: " \ + -d '{"name": "organization-name"}' +``` + +**Add Domain to Org (v2):** +```bash +curl -X POST "http://:8080/v2/organizations//domains" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $PAT" \ + -H "Host: " \ + -d '{"domainName": "example.com"}' +``` + +**⚠️ Critical:** Always include `-H "Host: "` header or API returns 404. + +--- + +### Common Errors & Quick Fixes + +| Error | Cause | Fix | +|-------|-------|-----| +| `No IPv4 on container` | DHCP failed | Set static IP (see above) | +| `/tmp permission denied` | Wrong permissions | `chmod 1777 /tmp` | +| `Errors.Token.Invalid (AUTH-7fs1e)` | Zitadel PAT expired | Regenerate via console | +| `failed SASL auth` | Wrong DB password | Check Vault credentials | +| `GLIBC_2.39 not found` | Wrong build environment | Rebuild in system container | +| `connection refused` | Service down | `systemctl restart ` | +| `exec format error` | Architecture mismatch | Recompile for target arch | +| `address already in use` | Port conflict | `lsof -i :` | +| `certificate verify failed` | Wrong CA cert | Copy from vault container | +| `DNS lookup failed` | No IPv4 connectivity | Check network config | + +--- + +## Contact Escalation + +If quick fixes don't work: + +1. Capture logs: `sudo incus exec system -- tar czf /tmp/debug-$(date +%Y%m%d).tar.gz /opt/gbo/logs/` +2. Check AGENTS.md for development troubleshooting +3. Review recent commits for breaking changes +4. Consider snapshot rollback (last resort)