When a .gbkb file is deleted from the bucket, DriveMonitor now:
- Deletes the downloaded file from work directory
- When entire KB folder is empty, removes the folder too
- Prevents disk accumulation of orphaned knowledge base files
When a .bas file is deleted from the bucket, DriveMonitor now:
- Deletes the corresponding .ast compiled file
- Deletes .bas, .mcp.json, .tool.json files from work directory
- Removes the path from file_states tracking
This prevents stale compiled files from accumulating in production.
Fixed bug where DriveMonitor would overwrite indexed=true status after
successful compilation, causing files to be recompiled on every cycle.
Changes:
- Track successful compilations in HashSet before acquiring write lock
- Set indexed=true for successfully compiled files in merge loop
- Preserve indexed status for unchanged files
- Handle compilation failures with proper fail_count tracking
This ensures new .bas files are compiled to .ast once and the indexed
status is preserved, preventing unnecessary recompilation.
- Do not mark .bas files as indexed unconditionally
- Only set indexed=true when compile_tool() completes successfully
- Reset fail_count and last_failed_at on successful compilation
- Retry failed compilations automatically on next cycle
- Fixes permanent compilation failure state for salesianos start.bas
- Replace ADD SUGGESTION TOOL with ADD_SUGG_TOOL (single token)
- Replace ADD SUGGESTION TEXT with ADD_SUGG_TEXT
- Replace ADD SUGGESTION with ADD_SUGG
- Keep ADD_SUGGESTION_TOOL as legacy alias for backward compat
- Preprocessor converts ADD SUGGESTION TOOL -> ADD_SUGG_TOOL automatically
- Eliminates collision with ADD BOT, ADD MEMBER in Rhai parser
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Previously, ensure_llama_servers_running() would return early
when both LLM and embedding servers were already running, without
calling set_embedding_server_ready(true). This caused DriveMonitor
to skip KB indexing with 'Embedding server not yet marked ready'.
Fix: call set_embedding_server_ready(true) before returning early
when servers are already running.
- Filter states by kb_folder_pattern (e.g. 'cartas/', 'proc/')
- Only apply backoff based on files in that specific KB folder
- Each KB folder has independent retry timing
The bge-small-en-v1.5-f32.gguf model has n_ctx_train=512. With batch_size=16
and ~300+ tokens per chunk, total tokens exceed 512 causing GGML_ASSERT crash.
Now with batch_size=2, embeddings are processed safely.
- Skip re-indexing files that failed 3+ times within 1 hour
- Update file_states on indexing success (indexed=true, fail_count=0)
- Update file_states on indexing failure (fail_count++, last_failed_at=now)
- Don't skip KB indexing when embedding server not marked ready yet
- Embedding server health will be detected via wait_for_server() in kb_indexer
- Remove drive_monitor bypass of embedding check - let kb_indexer handle it
- Remove hardcoded URL list for remote API detection
- Try /health first, then probe with HEAD if 404/405
- Re-enable embedding server ready check in drive_monitor
- No more embedding_key hack that skipped health checks entirely
- Bug 1: check_gbkb_changes now preserves indexed=true from previous
state when etag matches, preventing redundant re-indexing every cycle
- Bug 2: USE KB fallback uses bot_id_short (8 chars) instead of random
UUID, matching the collection name convention used by DriveMonitor
- Bug 3: handle_gbkb_change now upserts into kb_collections table after
successful indexing, so USE KB can find the collection at runtime
- Changed ON CONFLICT DO NOTHING to DO UPDATE for kb_collections inserts
- Changed process_gbkb_folder return type to Result<IndexingResult>
File states were stored under /opt/gbo/work/{UUID}/file_states.json
but should be under /opt/gbo/work/{bucket_name}/file_states.json
like other bot data (e.g. /opt/gbo/work/salesianos.gbai/)
Also fixed file_states_static signature to use bucket_name consistently.
- get_work_path_default/get_stack_path no longer rely on CWD-relative botserver-stack check which caused wrong output path in production when CI left that directory
- DriveMonitor now marks .bas file states as indexed=true after list+compile cycle
- Added compile_tool logging for work_dir path
1. Fix model.starts_with('') always true - was limiting ALL models to 768 tokens
(local llama limit), truncating system prompts and KB context. Now only
applies when model=='local' or empty string, default is 32k tokens.
2. Fix create_bot_from_drive missing NOT NULL columns (llm_provider,
context_provider) - bots auto-created from S3 buckets failed to persist.
3. Fix S3 endpoint URL construction missing port 9100.
4. Fix Vault seed: vectordb.url was empty string, now defaults to
http://localhost:6333.
5. Fix Vault credential regeneration on recovery - added vault_seeds_exist().
6. Fix CA cert path for Vault TLS (botserver-stack vs botserver-stack).
7. Add bot verification after insert in create_bot_from_drive.
- SSH to system container and clean unused workspaces
- Keep only botserver/target and active CI directories
- Clean alm-ci workspaces not used by botserver
- Free up disk space before compilation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Restore botlib repository with --depth 1 fetch
- Restore gb-ws workspace from /opt/gbo/data/gb
- Use --depth 1 for all clone operations (faster)
- Build with --features chat flag
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- If /opt/gbo/data/botserver/.git exists, pull instead of clone
- Prevents 'destination already exists' errors from persistent directories
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove only /opt/gbo/data/botserver/.git to preserve workspace
- Avoids 'destination already exists' error on git clone
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Proper workspace setup with botlib and botserver repos
- Incremental git pull for sccache optimization
- Production deployment via SSH tarball
- Workspace: /opt/gbo/data
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add ZITADEL_DATABASE_* environment variables to directory component env_vars
- Remove inline env vars from exec_cmd (now applied via spawn_with_envs)
- Use $DB_PASSWORD reference to fetch from Vault at runtime
- This ensures Zitadel gets database credentials on every boot, not just during install
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>