Commit graph

4277 commits

Author SHA1 Message Date
694fb91efe Add comment about batch_size reduction for llama-server stability
All checks were successful
BotServer CI/CD / build (push) Successful in 2m9s
2026-04-12 09:59:49 -03:00
d3673e1f34 Add KB fail state migration: fail_count and last_failed_at columns
All checks were successful
BotServer CI/CD / build (push) Successful in 48s
New migration 6.3.0-01-kb-fail-state to add columns to kb_documents
for intelligent backoff retry logic.
2026-04-12 09:43:47 -03:00
73f1898b62 Add fail_count and last_failed_at to kb_documents
All checks were successful
BotServer CI/CD / build (push) Successful in 3m7s
Simplified KB indexing state tracking - added columns directly
to kb_documents instead of separate table. This enables per-file
backoff retry logic.
2026-04-12 09:36:39 -03:00
256d55fc93 Add smart sleep based on fail_count to prevent excessive monitoring cycles
All checks were successful
BotServer CI/CD / build (push) Successful in 3m9s
- fail_count >= 3: sleep 1 hour
- fail_count >= 2: sleep 15 min
- fail_count >= 1: sleep 5 min
- fail_count = 0: sleep 10 sec (default)
2026-04-12 09:20:17 -03:00
789789e313 Fix backoff logic to be per KB folder instead of global
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Filter states by kb_folder_pattern (e.g. 'cartas/', 'proc/')
- Only apply backoff based on files in that specific KB folder
- Each KB folder has independent retry timing
2026-04-12 09:15:32 -03:00
ee273256fb Add backoff logic to KB indexing to prevent excessive retries
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- fail_count 1: wait 5 minutes before retry
- fail_count 2: wait 15 minutes before retry
- fail_count 3+: wait 1 hour before retry

This prevents the 'already being indexed, skipping duplicate task' loop.
2026-04-12 09:13:33 -03:00
cd0e049e81 Reduce embedding batch_size from 16 to 2 to prevent llama-server crash
All checks were successful
BotServer CI/CD / build (push) Successful in 2m3s
The bge-small-en-v1.5-f32.gguf model has n_ctx_train=512. With batch_size=16
and ~300+ tokens per chunk, total tokens exceed 512 causing GGML_ASSERT crash.

Now with batch_size=2, embeddings are processed safely.
2026-04-12 08:21:39 -03:00
f48fa6d5f0 Add fail_count/last_failed_at to FileState for indexing retries
All checks were successful
BotServer CI/CD / build (push) Successful in 3m21s
- Skip re-indexing files that failed 3+ times within 1 hour
- Update file_states on indexing success (indexed=true, fail_count=0)
- Update file_states on indexing failure (fail_count++, last_failed_at=now)
- Don't skip KB indexing when embedding server not marked ready yet
- Embedding server health will be detected via wait_for_server() in kb_indexer
- Remove drive_monitor bypass of embedding check - let kb_indexer handle it
2026-04-12 07:47:13 -03:00
cdab04e999 Fix embedding health check: behavior-based instead of URL whitelist
All checks were successful
BotServer CI/CD / build (push) Successful in 3m32s
- Remove hardcoded URL list for remote API detection
- Try /health first, then probe with HEAD if 404/405
- Re-enable embedding server ready check in drive_monitor
- No more embedding_key hack that skipped health checks entirely
2026-04-12 07:15:54 -03:00
2bafd57046 Temp fix: Skip embedding server ready check in DriveMonitor KB indexing
All checks were successful
BotServer CI/CD / build (push) Successful in 3m19s
2026-04-12 06:58:55 -03:00
be3e4c4e54 Fix: Handle 'reasoning' field from NVIDIA kimi-k2.5 model
All checks were successful
BotServer CI/CD / build (push) Successful in 3m6s
2026-04-11 22:58:50 -03:00
47cb470c8e Fix: Handle reasoning_content from NVIDIA reasoning models (gpt-oss-120b)
All checks were successful
BotServer CI/CD / build (push) Successful in 3m16s
2026-04-11 22:30:39 -03:00
7a1ec157f1 Fix KB indexing: upsert kb_collections, consistent collection names, preserve indexed flag
All checks were successful
BotServer CI/CD / build (push) Successful in 3m23s
- Bug 1: check_gbkb_changes now preserves indexed=true from previous
  state when etag matches, preventing redundant re-indexing every cycle
- Bug 2: USE KB fallback uses bot_id_short (8 chars) instead of random
  UUID, matching the collection name convention used by DriveMonitor
- Bug 3: handle_gbkb_change now upserts into kb_collections table after
  successful indexing, so USE KB can find the collection at runtime
- Changed ON CONFLICT DO NOTHING to DO UPDATE for kb_collections inserts
- Changed process_gbkb_folder return type to Result<IndexingResult>
2026-04-11 21:26:02 -03:00
e81aee6221 fix: use bucket_name instead of bot_id (UUID) for file_states.json path
All checks were successful
BotServer CI/CD / build (push) Successful in 3m22s
File states were stored under /opt/gbo/work/{UUID}/file_states.json
but should be under /opt/gbo/work/{bucket_name}/file_states.json
like other bot data (e.g. /opt/gbo/work/salesianos.gbai/)

Also fixed file_states_static signature to use bucket_name consistently.
2026-04-11 20:40:23 -03:00
cf4a00e16e fix: work path uses production /opt/gbo when env exists or path exists; mark .bas files indexed=true after compilation
All checks were successful
BotServer CI/CD / build (push) Successful in 3m20s
- get_work_path_default/get_stack_path no longer rely on CWD-relative botserver-stack check which caused wrong output path in production when CI left that directory
- DriveMonitor now marks .bas file states as indexed=true after list+compile cycle
- Added compile_tool logging for work_dir path
2026-04-11 20:16:22 -03:00
5fdb3be5b4 fix: save file_states after prompt etag update to stop PROMPT.md download loop
All checks were successful
BotServer CI/CD / build (push) Successful in 3m41s
2026-04-11 19:21:26 -03:00
f4c99030aa fix: use get_work_path() instead of get_stack_path()+data/system for work dir, add etag check for PROMPT.md downloads
All checks were successful
BotServer CI/CD / build (push) Successful in 3m37s
2026-04-11 18:42:09 -03:00
a4a3837c4c fix: critical bugs - LLM context truncation, bot creation, S3 endpoint, vectordb seed
All checks were successful
BotServer CI/CD / build (push) Successful in 3m28s
1. Fix model.starts_with('') always true - was limiting ALL models to 768 tokens
   (local llama limit), truncating system prompts and KB context. Now only
   applies when model=='local' or empty string, default is 32k tokens.

2. Fix create_bot_from_drive missing NOT NULL columns (llm_provider,
   context_provider) - bots auto-created from S3 buckets failed to persist.

3. Fix S3 endpoint URL construction missing port 9100.

4. Fix Vault seed: vectordb.url was empty string, now defaults to
   http://localhost:6333.

5. Fix Vault credential regeneration on recovery - added vault_seeds_exist().

6. Fix CA cert path for Vault TLS (botserver-stack vs botserver-stack).

7. Add bot verification after insert in create_bot_from_drive.
2026-04-11 17:56:03 -03:00
a131120638 Fix KB indexing: bot-specific embedding config, PROMPT.md sync, single-file streaming
All checks were successful
BotServer CI/CD / build (push) Successful in 4m1s
2026-04-11 13:27:48 -03:00
12988b637d Fix KB indexing: single file streaming, dedup tracking, .ast cache
All checks were successful
BotServer CI/CD / build (push) Successful in 12m31s
2026-04-11 13:10:09 -03:00
821dd1d7ab fix: Use bot-specific embedding config in DriveMonitor KB manager
All checks were successful
BotServer CI/CD / build (push) Successful in 3m47s
2026-04-11 08:55:41 -03:00
dd4c780c4d Fix Zitadel health check and add ss command to allowed commands
All checks were successful
BotServer CI/CD / build (push) Successful in 3m20s
- Add 'ss' to ALLOWED_COMMANDS for port checking
- Fix Zitadel health check URL to include full address
2026-04-11 07:49:11 -03:00
5576378b3f Update botserver: Multiple improvements across core modules
All checks were successful
BotServer CI/CD / build (push) Successful in 10m41s
2026-04-11 07:33:32 -03:00
4cc1e3aa4b Fix CI: Clean up all workspaces on system before build
Some checks failed
BotServer CI/CD / build (push) Failing after 5s
- SSH to system container and clean unused workspaces
- Keep only botserver/target and active CI directories
- Clean alm-ci workspaces not used by botserver
- Free up disk space before compilation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 21:15:16 -03:00
5a1677bf2e Fix CI: Restore botlib and gb-ws workspaces, add --depth 1
Some checks failed
BotServer CI/CD / build (push) Failing after 11m18s
- Restore botlib repository with --depth 1 fetch
- Restore gb-ws workspace from /opt/gbo/data/gb
- Use --depth 1 for all clone operations (faster)
- Build with --features chat flag

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 21:00:21 -03:00
74d820fbde Fix CI: Check if workspace is git repo before clone
Some checks failed
BotServer CI/CD / build (push) Failing after 1s
- If /opt/gbo/data/botserver/.git exists, pull instead of clone
- Prevents 'destination already exists' errors from persistent directories

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:55:49 -03:00
9e8f3bc309 Fix CI: Only clean .git dir, not entire workspace
Some checks failed
BotServer CI/CD / build (push) Failing after 0s
- Remove only /opt/gbo/data/botserver/.git to preserve workspace
- Avoids 'destination already exists' error on git clone

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:50:58 -03:00
d0e24652c3 Fix CI: Remove non-existent botlib workspace
Some checks failed
BotServer CI/CD / build (push) Failing after 10s
- botlib repository doesn't exist in external repo
- Remove botlib setup to prevent workspace creation failures
- Keep only botserver workspace management

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:47:10 -03:00
6b17476dfb Fix CI: Clean workspace before clone
Some checks failed
BotServer CI/CD / build (push) Failing after 2s
- Add rm -rf /opt/gbo/data/botserver before git clone
- Prevents 'destination already exists' error on re-runs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:45:04 -03:00
b23fa90da1 Restore production CI/CD workflow from 30 March
Some checks failed
BotServer CI/CD / build (push) Failing after 1s
- Proper workspace setup with botlib and botserver repos
- Incremental git pull for sccache optimization
- Production deployment via SSH tarball
- Workspace: /opt/gbo/data

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:40:23 -03:00
286dc5ee15 Update Zitadel to v4.13.1 (latest release from past week)
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:17:08 -03:00
5e3334ae7f Simplify CI/CD clone process: Remove all complexity
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Simple clean, clone, submodule update
- /home/gbuser/target preserved separately

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:09:36 -03:00
6b245dd690 Optimize CI/CD clone process: Preserve compilation cache
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Always clone fresh from ALM, keeping /home/gbuser/target for incremental builds
- /home/gbuser/target is outside workspace, persists across runs
- Simplified clone logic, removed complex conditional checks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 20:02:07 -03:00
cb1998efe8 Fix Zitadel bootstrap: Pass all database env vars to ensure connection
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
- Add ZITADEL_DATABASE_* environment variables to directory component env_vars
- Remove inline env vars from exec_cmd (now applied via spawn_with_envs)
- Use $DB_PASSWORD reference to fetch from Vault at runtime
- This ensures Zitadel gets database credentials on every boot, not just during install

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 19:55:52 -03:00
e6d3f5aeaf Fix CI: clean source and fresh clone, keep target cache for incremental build
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-10 15:36:12 -03:00
54c317dbbc Fix CI: force submodule update with proper YAML indentation
All checks were successful
BotServer CI/CD / build (push) Successful in 5m56s
2026-04-10 15:28:39 -03:00
464d9f88ba Fix CI: force submodule update to match parent repo 2026-04-10 15:19:29 -03:00
ca04e6cecf Fix: change DRIVE URL from https to http
All checks were successful
BotServer CI/CD / build (push) Successful in 58s
2026-04-10 14:34:11 -03:00
0fad62aed9 Fix S3 endpoint: add http:// prefix if missing
All checks were successful
BotServer CI/CD / build (push) Successful in 4m28s
2026-04-10 13:39:07 -03:00
514427c7cc Fix PROMPT.md loading: use get_work_path instead of get_stack_path
All checks were successful
BotServer CI/CD / build (push) Successful in 5m46s
2026-04-10 13:11:45 -03:00
db2dc3fb34 Fix warnings: remove unused variables in drive_monitor
All checks were successful
BotServer CI/CD / build (push) Successful in 11m32s
2026-04-10 12:58:20 -03:00
152fbe3a38 Fix CI: initialize all workspace members
Some checks failed
BotServer CI/CD / build (push) Has been cancelled
2026-04-10 12:35:31 -03:00
28f811bb7f Update botserver workflow
Some checks failed
BotServer CI/CD / build (push) Failing after 9s
2026-04-10 12:27:24 -03:00
5e955d3196 Fix CI: Handle divergent submodule histories with fetch+reset
Some checks failed
BotServer CI/CD / build (push) Failing after 5s
- Changed from 'git pull --ff-only' to 'git fetch + git reset --hard'
- This handles cases where local submodule history has diverged from remote
- Ensures CI always uses exact remote state regardless of local history

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 12:12:48 -03:00
aaccd741e3 Fix CI: Use gbuser home directory and restore original Setup Workspace
Some checks failed
BotServer CI/CD / build (push) Failing after 10s
- Changed WORKSPACE from /opt/gbo/data/botserver to /home/gbuser/workspace
- Changed CARGO_TARGET_DIR from /opt/gbo/data/botserver/target to /home/gbuser/target
- Restored original Setup Workspace approach that clones gb-ws and uses its Cargo.toml
- Uses shallow clones (--depth 1) for efficiency
- Only initializes necessary submodules (botlib and botserver)
- Updated build and deploy paths to use gbuser home directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 12:09:07 -03:00
2634521d9e Optimize CI Setup Workspace - avoid full codebase download
- Remove gb-ws clone (unnecessary intermediate step)
- Use --depth 1 for shallow clones (only latest commit)
- Create minimal Cargo.toml directly (only botlib + botserver members)
- Use git pull --ff-only for updates (no full history)
- Significantly reduces CI time and disk usage
- Maintains single-pull strategy
2026-04-10 11:38:13 -03:00
26b009d4e6 Fix: Remove duplicate method definitions in DriveMonitor
All checks were successful
BotServer CI/CD / build (push) Successful in 4m52s
- Removed duplicate file_state_path() and load_file_states() methods
- Kept only new save_file_states_static() helper
- Original methods still exist at lines 79-84 and 87-128
- Fixes compilation errors from previous commit
2026-04-10 11:31:17 -03:00
816d416eee Fix DriveMonitor dispatch failure in main repo
Some checks failed
BotServer CI/CD / build (push) Failing after 1m31s
- Added static save_file_states_static() helper method
- Changed tokio::spawn calls to use Arc::clone instead of Arc::new(self.clone())
- This prevents double Arc wrapping which causes 'dispatch failure' errors
- Fixes config.csv not syncing from bucket to database for salesianos/default bots
2026-04-10 11:24:56 -03:00
918cb623a1 ci: trigger build
All checks were successful
BotServer CI/CD / build (push) Successful in 1m0s
2026-04-10 09:05:32 -03:00
dc933c22e4 ci: kill stuck cargo before build
All checks were successful
BotServer CI/CD / build (push) Successful in 1m3s
2026-04-10 08:50:07 -03:00