- Fix PAT extraction timing with retry loop (waits up to 60s for PAT in logs)
- Add sync command to flush filesystem buffers before extraction
- Improve logging with progress messages and PAT verification
- Refactor setup code into consolidated setup.rs module
- Fix YAML indentation for PatPath and MachineKeyPath
- Change Zitadel init parameter from --config to --steps
The timing issue occurred because:
1. Zitadel writes PAT to logs at startup (~18:08:59)
2. Post-install extraction ran too early (~18:09:35)
3. PAT file wasn't created until ~18:10:38 (63s after installation)
4. OAuth client creation failed because PAT file didn't exist yet
With the retry loop:
- Waits for PAT to appear in logs with sync+grep check
- Extracts PAT immediately when found
- OAuth client creation succeeds
- directory_config.json saved with valid credentials
- Login flow works end-to-end
Tested: Full reset.sh and login verification successful
Fixed compilation errors by adding .await to all ensure_admin_token() calls:
- create_organization()
- create_user()
- save_config()
The method was made async but the calls weren't updated.
- Add vector_db_health_check() function to verify Qdrant availability
- Add wait loop for vector_db startup in bootstrap (15 seconds)
- Fallback to local LLM when external URL configured but no API key provided
- Prevent external LLM (api.z.ai) usage without authentication key
This fixes the production issues:
- Qdrant vector database not available at https://localhost:6333
- External LLM being used instead of local when no key is configured
- Ensures vector_db is properly started and ready before use
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous /dev/tcp test was giving false positives, reporting that
Valkey was running when it was actually down. This caused bootstrap to
skip starting Valkey, leading to botserver hanging on cache connection.
Changes:
- Use nc (netcat) with -z flag for reliable port checking
- Final fallback: /dev/tcp with actual PING/PONG verification
- Only returns true if port is open AND responds correctly
This ensures cache_health_check() accurately reports Valkey status.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Try valkey-cli first (preferred for Valkey installations)
- Fall back to redis-cli (for Redis installations)
- Fall back to TCP connection test (works for both)
This fixes environments that only have Valkey installed without
Redis symlinks or redis-cli.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Fix component name mismatch: "redis" -> "cache" in bootstrap_manager
- Add cache_health_check() function to verify Valkey is responding
- Add health check loop after starting cache (12s wait with PING test)
- Ensures cache is ready before proceeding with bootstrap
This fixes the issue where botserver would hang waiting for cache
connection because the cache component was never started.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>