- Add vector_db_health_check() function to verify Qdrant availability
- Add wait loop for vector_db startup in bootstrap (15 seconds)
- Fallback to local LLM when external URL configured but no API key provided
- Prevent external LLM (api.z.ai) usage without authentication key
This fixes the production issues:
- Qdrant vector database not available at https://localhost:6333
- External LLM being used instead of local when no key is configured
- Ensures vector_db is properly started and ready before use
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The config_manager.get_config() can return Ok("") for empty config values,
which would pass through unwrap_or_else() without using the default.
Added checks after config retrieval to use defaults when config values
are empty strings:
- gpu_layers: "20" (default for GPU layers)
- n_moe: "4" (default for MoE)
- parallel: "1" (default for parallel)
- n_predict: "50" (default for predict)
- n_ctx_size: "32000" (default for context size)
This fixes the error: "error while handling argument --n-gpu-layers: stoi"
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When no default.gbai/config.csv exists, the system now:
- Sets default llm_server_path to ./botserver-stack/bin/llm/build/bin
- Uses correct relative paths to model files: ../../../../data/llm/
- Uses actual model filenames from 3rdparty.toml
This fixes the issue where LLM/embedding servers couldn't find model files
because the paths were constructed incorrectly.
Model filenames:
- LLM: DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf
- Embedding: bge-small-en-v1.5-f32.gguf
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed the default LLM model from glm-4 to deepseek-small to match
the model defined in 3rdparty.toml ([models.deepseek_small]).
This ensures that when no default.gbai/config.csv exists, the system
uses the correct default local model.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When no default.gbai/config.csv exists or when llm-model/embedding-model
config is empty, the system now uses default local models instead of
skipping server startup.
Changes:
- Default LLM model: glm-4
- Default Embedding model: bge-small-en-v1.5
- Logs when using defaults
This fixes the issue where the "default" bot would fail to load LLM
and Embedding services when no config.csv was present, causing the
error: "not loading embedding neither llm local for default bot"
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The init_redis() function was using synchronous blocking calls
(redis::Client::get_connection()) inside an async function, which
blocked the entire tokio runtime and caused botserver to freeze.
Changes:
- Wrap Redis connection calls in tokio::task::spawn_blocking()
- Runs blocking operations in separate thread pool
- Prevents tokio runtime from freezing during cache connection
This fixes the issue where botserver would hang indefinitely
when connecting to Valkey/Redis cache.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous /dev/tcp test was giving false positives, reporting that
Valkey was running when it was actually down. This caused bootstrap to
skip starting Valkey, leading to botserver hanging on cache connection.
Changes:
- Use nc (netcat) with -z flag for reliable port checking
- Final fallback: /dev/tcp with actual PING/PONG verification
- Only returns true if port is open AND responds correctly
This ensures cache_health_check() accurately reports Valkey status.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Try valkey-cli first (preferred for Valkey installations)
- Fall back to redis-cli (for Redis installations)
- Fall back to TCP connection test (works for both)
This fixes environments that only have Valkey installed without
Redis symlinks or redis-cli.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Fix component name mismatch: "redis" -> "cache" in bootstrap_manager
- Add cache_health_check() function to verify Valkey is responding
- Add health check loop after starting cache (12s wait with PING test)
- Ensures cache is ready before proceeding with bootstrap
This fixes the issue where botserver would hang waiting for cache
connection because the cache component was never started.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Binaries at cache/bin/valkey-server (correct production path)
- Use --strip-components=1 for extraction
- Matches /opt/gbo/bin/botserver-stack/bin/cache/bin/
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use --strip-components=2 to flatten tarball structure
- Binaries go to cache/valkey-server (not cache/bin/valkey-server)
- Matches production path: /opt/gbo/bin/botserver-stack/bin/cache/
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Binary is at cache/bin/bin/valkey-server after extraction
- Update exec_cmd and check_cmd to use bin/ subdirectory
- Create symlinks at parent level for convenience
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use valkey-server and valkey-cli directly
- No redis compatibility symlinks needed
- Simplifies installation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- noble version requires GLIBC 2.38 (Ubuntu 24.04)
- jammy version works with GLIBC 2.36 (Ubuntu 22.04)
- System has GLIBC 2.36, needs compatible binary
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use valkey-8.1.5-noble-x86_64.tar.gz instead of 9.0.2-jammy
- More stable version for production use
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change retry interval from 1s to 5s between attempts
- Reduce attempts from 30 to 12 (still 60s total wait time)
- Gives Valkey more time to stabilize between connection attempts
- Helps with slow-to-start services during bootstrap
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Wait up to 30 seconds for cache to be ready
- Retry every 1 second with progress logging
- Prevents race condition during service startup
- Ensures suggestions feature works when Valkey starts after botserver
Fixes issue where cache connection failed during bootstrap if
Valkey wasn't immediately ready.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add connection verification for Redis/Valkey cache with PING test
- Support CACHE_URL, REDIS_URL, and VALKEY_URL environment variables
- Add better error messages when cache is unavailable
- Add LLM_URL and LLM_MODEL environment variable support
- LLM configuration now checks env vars first, then database, then defaults
- This ensures local LLM (http://localhost:8081) is used as proper fallback
Fixes suggestions button not working when valkey is unavailable
and improves LLM configuration when no bot config.csv exists.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Escape format placeholders in designer_ai.rs ({{botname}})
- Remove undefined 'prefix' filter in drive_monitor
- Fix type mismatch in use_tool.rs (str vs &String)
- Remove unused TextExpressionMethods import
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add TLS certificate generation to Vault post-install commands (CA, server, client, PostgreSQL)
- Add initialize_vault_local() function to handle Vault initialization for local installs
- Add ensure_env_file_exists() function to create .env when Vault already initialized
- Modify start() method to call Vault initialization after successful start (local mode)
- Fix Vault CLI flags: use -tls-skip-verify (not -skip-verify or -skip-tls-verify)
This restores the behavior where .env is automatically created with VAULT_ADDR,
VAULT_TOKEN, and VAULT_CACERT during local bootstrap, matching the LXC container
deployment behavior in facade.rs.
Fixes issue where .env file was only created for LXC deployments but not for
local installations.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Enhanced conversational input handling for batismo tool
- Improved keyword extraction and format recognition
- Fixed field extraction from informal user messages
- Better natural language understanding
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add Redis-based tracking to prevent start.bas from running repeatedly
when clicking suggestion buttons. start.bas now executes only once per
session with a 24-hour expiration on the tracking key.
- Add generic tool executor (ToolExecutor) for parsing and executing
tool calls from any LLM provider. Works with Claude, OpenAI, and
other providers that use standard tool calling formats.
- Update both start.bas execution paths (WebSocket handler and LLM
message handler) to check Redis before executing.
- Fix suggestion duplication by clearing suggestions from Redis after
fetching them.
- Add rate limiter for LLM API calls.
- Improve error handling and logging throughout.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Check BOTSERVER_PORT env var before database config
- Override server_port from database if BOTSERVER_PORT is set
- Apply to both from_database() and from_env() config paths
- Allows easy port configuration via environment variable
- Removed embed-ui from default features (botserver is backend API only)
- UI embedding is botui's responsibility, not botserver's
- Fixed rust-embed folder path in embedded_ui.rs
- Resolves CI compilation errors
Production binaries should embed UI assets by default to avoid
requiring external ui/suite folder in deployment.
This ensures botserver binary contains all HTMX, HTML, CSS, and JS
assets needed for the web interface.
- Use local pg_isready path when available (./botserver-stack/bin/tables/bin/pg_isready)
- Fall back to system pg_isready if local binary not found
- Prevents 30-second timeout during bootstrap when PostgreSQL is actually running
- Applied to both readiness checks in start_all() method
- Removed -d 'postgres' parameter from pg_isready health checks
- Health check now only verifies server connection on port 5432
- Fixes false positive failures when PostgreSQL is running but specific database has issues
- PostgreSQL logs showed 'database system is ready' but health check was failing
Add pg_isready health check to the 'already running' branch to ensure
PostgreSQL is properly detected as ready, even when running as a
non-interactive user (sudo -u gbuser).
This complements the previous fix for fresh PostgreSQL starts.
Changed pg_isready checks from '-U gbuser' to '-d postgres' to properly
detect PostgreSQL readiness during bootstrap. The gbuser database doesn't
exist yet during startup, causing pg_isready to fail and bootstrap to timeout.
This fixes the issue when running botserver as a non-interactive user
(e.g., sudo -u gbuser).