- Add detailed logging for all 5 pipeline stages (PLAN, BUILD, REVIEW, DEPLOY, MONITOR) - Log stage start/complete events with agent IDs and progress details - Add resource creation/deletion logging in drive_handlers - Improve pipeline summary logging with task ID, nodes, resources, and URL This addresses the requirement for textual progress in console logs.
18 KiB
BotServer Cinema Viewer Logging Plan
🎬 The Cinema Viewer Philosophy
Think of your logs as a movie with different viewing modes:
- INFO = The Movie - Watch the main story unfold (production-ready)
- DEBUG = Director's Commentary - Behind-the-scenes details (troubleshooting)
- TRACE = Raw Footage - Every single take and detail (deep debugging)
- WARN = Plot Holes - Things that shouldn't happen but are recoverable
- ERROR = Scene Failures - Critical issues that break the narrative
📊 Log Level Definitions
ERROR - Critical Failures
When to use: System cannot proceed, requires immediate attention
// ✅ GOOD - Actionable, clear context
error!("Database connection failed - retrying in 5s: {}", e);
error!("Authentication failed for user {}: invalid credentials", user_id);
error!("Stage 2 BUILD failed: {}", e);
// ❌ BAD - Vague, no context
error!("Error!");
error!("Failed");
error!("Something went wrong: {}", e);
WARN - Recoverable Issues
When to use: Unexpected state but system can continue
// ✅ GOOD - Explains the issue and impact
warn!("Failed to create data directory: {}. Using fallback.", e);
warn!("LLM server not ready - deferring embedding generation");
warn!("Rate limit approaching for API key {}", key_id);
// ❌ BAD - Not actionable
warn!("Warning");
warn!("Something happened");
INFO - The Main Story
When to use: Key events, state changes, business milestones
// ✅ GOOD - Tells a story, shows progress
info!("Pipeline starting - task: {}, intent: {}", task_id, intent);
info!("Stage 1 PLAN complete - {} nodes planned", node_count);
info!("User {} logged in from {}", user_id, ip_address);
info!("Server started on port {}", port);
// ❌ BAD - Too verbose, implementation details
info!("Entering function process_request");
info!("Variable x = {}", x);
info!("Loop iteration {}", i);
DEBUG - Behind the Scenes
When to use: Troubleshooting information, decision points, state inspections
// ✅ GOOD - Helps diagnose issues
debug!("Request payload: {:?}", payload);
debug!("Using cache key: {}", cache_key);
debug!("Retry attempt {} of {}", attempt, max_retries);
debug!("Selected LLM model: {} for task type: {}", model, task_type);
// ❌ BAD - Too trivial
debug!("Variable assigned");
debug!("Function called");
TRACE - Raw Footage
When to use: Step-by-step execution, loop iterations, detailed flow
// ✅ GOOD - Detailed execution path
trace!("Starting monitoring loop");
trace!("Processing file: {:?}", path);
trace!("Checking bot directory: {}", dir);
trace!("WebSocket message received: {} bytes", len);
// ❌ BAD - Noise without value
trace!("Line 100");
trace!("Got here");
trace!("...");
🎭 Logging Patterns by Module Type
1. Orchestration & Pipeline Modules
Examples: auto_task/orchestrator.rs, auto_task/agent_executor.rs
INFO Level: Show the story arc
info!("Pipeline starting - task: {}, intent: {}", task_id, intent);
info!("Stage 1 PLAN starting - Agent #1 analyzing request");
info!("Stage 1 PLAN complete - {} nodes planned", node_count);
info!("Stage 2 BUILD complete - {} resources, url: {}", count, url);
info!("Pipeline complete - task: {}, nodes: {}, resources: {}", task_id, nodes, resources);
DEBUG Level: Show decision points
debug!("Classified intent as: {:?}", classification);
debug!("Selected app template: {}", template_name);
debug!("Skipping stage 3 - no resources to review");
TRACE Level: Show execution flow
trace!("Broadcasting thought to UI: {}", thought);
trace!("Updating agent {} status: {} -> {}", id, old, new);
trace!("Sub-task generated: {}", task_name);
2. File Monitoring & Compilation
Examples: drive/local_file_monitor.rs, drive/drive_monitor/mod.rs
INFO Level: Key file operations
info!("Local file monitor started - watching /opt/gbo/data/*.gbai");
info!("Compiling bot: {} ({} files)", bot_name, file_count);
info!("Bot {} compiled successfully - {} tools generated", bot_name, tool_count);
DEBUG Level: File processing details
debug!("Detected change in: {:?}", path);
debug!("Recompiling {} - modification detected", bot_name);
debug!("Skipping {} - no changes detected", bot_name);
TRACE Level: File system operations
trace!("Scanning directory: {:?}", dir);
trace!("File modified: {:?} at {:?}", path, time);
trace!("Watching directory: {:?}", path);
3. Security & Authentication
Examples: security/jwt.rs, security/api_keys.rs, security/sql_guard.rs
INFO Level: Security events (always log these!)
info!("User {} logged in from {}", user_id, ip);
info!("API key {} created for user {}", key_id, user_id);
info!("Failed login attempt for user {} from {}", username, ip);
info!("Rate limit exceeded for IP: {}", ip);
DEBUG Level: Security checks
debug!("Validating JWT for user {}", user_id);
debug!("Checking API key permissions: {:?}", permissions);
debug!("SQL query sanitized: {}", safe_query);
TRACE Level: Security internals
trace!("Token expiry check: {} seconds remaining", remaining);
trace!("Permission check: {} -> {}", resource, allowed);
trace!("Hashing password with cost factor: {}", cost);
4. API Handlers
Examples: HTTP endpoint handlers in core/, drive/, etc.
INFO Level: Request lifecycle
info!("Request started: {} {} from {}", method, path, ip);
info!("Request completed: {} {} -> {} ({}ms)", method, path, status, duration);
info!("User {} created resource: {}", user_id, resource_id);
DEBUG Level: Request details
debug!("Request headers: {:?}", headers);
debug!("Request body: {:?}", body);
debug!("Response payload: {} bytes", size);
TRACE Level: Request processing
trace!("Parsing JSON body");
trace!("Validating request parameters");
trace!("Serializing response");
5. Database Operations
Examples: core/shared/models/, Diesel queries
INFO Level: Database lifecycle
info!("Database connection pool initialized ({} connections)", pool_size);
info!("Migration completed - {} tables updated", count);
info!("Database backup created: {}", backup_path);
DEBUG Level: Query information
debug!("Executing query: {}", query);
debug!("Query returned {} rows in {}ms", count, duration);
debug!("Cache miss for key: {}", key);
TRACE Level: Query details
trace!("Preparing statement: {}", sql);
trace!("Binding parameter {}: {:?}", index, value);
trace!("Fetching next row");
6. LLM & AI Operations
Examples: llm/, core/kb/
INFO Level: LLM operations
info!("LLM request started - model: {}, tokens: {}", model, estimated_tokens);
info!("LLM response received - {} tokens, {}ms", tokens, duration);
info!("Embedding generated - {} dimensions", dimensions);
info!("Knowledge base indexed - {} documents", doc_count);
DEBUG Level: LLM details
debug!("LLM prompt: {}", prompt_preview);
debug!("Using temperature: {}, max_tokens: {}", temp, max);
debug!("Selected model variant: {}", variant);
TRACE Level: LLM internals
trace!("Sending request to LLM API: {}", url);
trace!("Streaming token: {}", token);
trace!("Parsing LLM response chunk");
7. Startup & Initialization
Examples: main.rs, main_module/bootstrap.rs
INFO Level: Startup milestones
info!("Server starting on port {}", port);
info!("Database initialized - PostgreSQL connected");
info!("Cache initialized - Valkey connected");
info!("Secrets loaded from Vault");
info!("BotServer ready - {} bots loaded", bot_count);
DEBUG Level: Configuration details
debug!("Using config: {:?}", config);
debug!("Environment: {}", env);
debug!("Feature flags: {:?}", features);
TRACE Level: Initialization steps
trace!("Loading .env file");
trace!("Setting up signal handlers");
trace!("Initializing thread registry");
🎯 The Cinema Viewer Experience
Level 1: Watching the Movie (INFO)
RUST_LOG=botserver=info
What you see:
INFO botserver: Server starting on port 8080
INFO botserver: Database initialized - PostgreSQL connected
INFO botserver: User alice@example.com logged in from 192.168.1.100
INFO botserver::auto_task::orchestrator: Pipeline starting - task: abc123, intent: Create CRM
INFO botserver::auto_task::orchestrator: Stage 1 PLAN complete - 5 nodes planned
INFO botserver::auto_task::orchestrator: Stage 2 BUILD complete - 12 resources, url: /apps/crm
INFO botserver::auto_task::orchestrator: Pipeline complete - task: abc123, nodes: 5, resources: 12
INFO botserver: User alice@example.com logged out
Perfect for: Production monitoring, understanding system flow
Level 2: Director's Commentary (DEBUG)
RUST_LOG=botserver=debug
What you see: Everything from INFO plus:
DEBUG botserver::auto_task::orchestrator: Classified intent as: AppGeneration
DEBUG botserver::auto_task::orchestrator: Selected app template: crm
DEBUG botserver::security::jwt: Validating JWT for user alice@example.com
DEBUG botserver::drive::local_file_monitor: Detected change in: /opt/gbo/data/crm.gbai
DEBUG botserver::llm: Using temperature: 0.7, max_tokens: 2000
Perfect for: Troubleshooting issues, understanding decisions
Level 3: Raw Footage (TRACE)
RUST_LOG=botserver=trace
What you see: Everything from DEBUG plus:
TRACE botserver::drive::local_file_monitor: Scanning directory: /opt/gbo/data
TRACE botserver::auto_task::orchestrator: Broadcasting thought to UI: Analyzing...
TRACE botserver::llm: Streaming token: Create
TRACE botserver::llm: Streaming token: a
TRACE botserver::llm: Streaming token: CRM
TRACE botserver::core::db: Preparing statement: SELECT * FROM bots
Perfect for: Deep debugging, performance analysis, finding bugs
✨ Best Practices
1. Tell a Story
// ✅ GOOD - Shows the narrative
info!("Pipeline starting - task: {}", task_id);
info!("Stage 1 PLAN complete - {} nodes planned", nodes);
info!("Stage 2 BUILD complete - {} resources", resources);
info!("Pipeline complete - app deployed at {}", url);
// ❌ BAD - Just data points
info!("Task started");
info!("Nodes: {}", nodes);
info!("Resources: {}", resources);
info!("Done");
2. Use Structured Data
// ✅ GOOD - Easy to parse and filter
info!("User {} logged in from {}", user_id, ip);
info!("Request completed: {} {} -> {} ({}ms)", method, path, status, duration);
// ❌ BAD - Hard to parse
info!("User login happened");
info!("Request finished successfully");
3. Include Context
// ✅ GOOD - Provides context
error!("Database connection failed for bot {}: {}", bot_id, e);
warn!("Rate limit approaching for user {}: {}/{} requests", user_id, count, limit);
// ❌ BAD - No context
error!("Connection failed: {}", e);
warn!("Rate limit warning");
4. Use Appropriate Levels
// ✅ GOOD - Right level for right information
info!("Server started on port {}", port); // Key event
debug!("Using config: {:?}", config); // Troubleshooting
trace!("Listening on socket {:?}", socket); // Deep detail
// ❌ BAD - Wrong levels
trace!("Server started"); // Too important for trace
info!("Loop iteration {}", i); // Too verbose for info
error!("Variable is null"); // Not an error
5. Avoid Noise
// ✅ GOOD - Meaningful information
debug!("Retry attempt {} of {} for API call", attempt, max);
// ❌ BAD - Just noise
debug!("Entering function");
debug!("Exiting function");
debug!("Variable assigned");
6. Log State Changes
// ✅ GOOD - Shows what changed
info!("User {} role changed: {} -> {}", user_id, old_role, new_role);
info!("Bot {} status: {} -> {}", bot_id, old_status, new_status);
// ❌ BAD - No before/after
info!("User role updated");
info!("Bot status changed");
7. Include Timings for Operations
// ✅ GOOD - Performance visibility
info!("Database migration completed in {}ms", duration);
info!("LLM response received - {} tokens, {}ms", tokens, duration);
debug!("Query executed in {}ms", duration);
// ❌ BAD - No performance data
info!("Migration completed");
info!("LLM response received");
🔧 Implementation Guide
Step 1: Audit Current Logging
# Find all logging statements
find botserver/src -name "*.rs" -exec grep -n "info!\|debug!\|trace!\|warn!\|error!" {} +
# Count by level
grep -r "info!" botserver/src | wc -l
grep -r "debug!" botserver/src | wc -l
grep -r "trace!" botserver/src | wc -l
Step 2: Categorize by Module
Create a spreadsheet or document listing:
- Module name
- Current log levels used
- Purpose of the module
- What story should it tell
Step 3: Refactor Module by Module
Start with critical path modules:
- auto_task/orchestrator.rs - Already done! ✅
- drive/local_file_monitor.rs - File operations
- security/jwt.rs - Authentication events
- main.rs - Startup sequence
- core/bot/ - Bot lifecycle
Step 4: Test Different Verbosity Levels
# Test INFO level (production)
RUST_LOG=botserver=info cargo run
# Test DEBUG level (troubleshooting)
RUST_LOG=botserver=debug cargo run
# Test TRACE level (development)
RUST_LOG=botserver=trace cargo run
Step 5: Document Module-Specific Patterns
For each module, document:
- What story does it tell at INFO level?
- What troubleshooting info at DEBUG level?
- What raw details at TRACE level?
📋 Quick Reference Card
Log Level Decision Tree
Is this a failure that stops execution?
└─ YES → ERROR
└─ NO → Is this unexpected but recoverable?
└─ YES → WARN
└─ NO → Is this a key business event?
└─ YES → INFO
└─ NO → Is this useful for troubleshooting?
└─ YES → DEBUG
└─ NO → Is this step-by-step execution detail?
└─ YES → TRACE
└─ NO → Don't log it!
Module-Specific Cheat Sheet
| Module Type | INFO | DEBUG | TRACE |
|---|---|---|---|
| Orchestration | Stage start/complete, pipeline milestones | Decision points, classifications | UI broadcasts, state changes |
| File Monitoring | Monitor start, bot compiled | Changes detected, recompiles | File scans, timestamps |
| Security | Logins, key events, failures | Validations, permission checks | Token details, hash operations |
| API Handlers | Request start/end, resource changes | Headers, payloads | JSON parsing, serialization |
| Database | Connections, migrations | Queries, row counts | Statement prep, row fetching |
| LLM | Requests, responses, indexing | Prompts, parameters | Token streaming, chunking |
| Startup | Service ready, milestones | Config, environment | Init steps, signal handlers |
🎬 Example: Complete Pipeline Logging
Here's how a complete auto-task pipeline looks at different levels:
INFO Level (The Movie)
INFO Pipeline starting - task: task-123, intent: Create a CRM system
INFO Stage 1 PLAN starting - Agent #1 analyzing request
INFO Stage 1 PLAN complete - 5 nodes planned
INFO Stage 2 BUILD starting - Agent #2 generating code
INFO Stage 2 BUILD complete - 12 resources, url: /apps/crm-system
INFO Stage 3 REVIEW starting - Agent #3 checking code quality
INFO Stage 3 REVIEW complete - all checks passed
INFO Stage 4 DEPLOY starting - Agent #4 deploying to /apps/crm-system
INFO Stage 4 DEPLOY complete - app live at /apps/crm-system
INFO Stage 5 MONITOR starting - Agent #1 setting up monitoring
INFO Stage 5 MONITOR complete - monitoring active
INFO Pipeline complete - task: task-123, nodes: 5, resources: 12, url: /apps/crm-system
DEBUG Level (Director's Commentary)
INFO Pipeline starting - task: task-123, intent: Create a CRM system
DEBUG Classified intent as: AppGeneration
DEBUG Selected app template: crm_standard
INFO Stage 1 PLAN starting - Agent #1 analyzing request
DEBUG Generated 5 sub-tasks from intent
INFO Stage 1 PLAN complete - 5 nodes planned
INFO Stage 2 BUILD starting - Agent #2 generating code
DEBUG Using database schema: contacts, deals, activities
DEBUG Generated 3 tables, 8 pages, 1 tool
INFO Stage 2 BUILD complete - 12 resources, url: /apps/crm-system
...
TRACE Level (Raw Footage)
INFO Pipeline starting - task: task-123, intent: Create a CRM system
DEBUG Classified intent as: AppGeneration
TRACE Extracting entities from: "Create a CRM system"
TRACE Found entity: CRM
TRACE Found entity: system
DEBUG Selected app template: crm_standard
INFO Stage 1 PLAN starting - Agent #1 analyzing request
TRACE Broadcasting thought to UI: Analyzing request...
TRACE Deriving plan sub-tasks
TRACE Sub-task 1: Create database schema
TRACE Sub-task 2: Generate list page
TRACE Sub-task 3: Generate form pages
TRACE Sub-task 4: Create BASIC tools
TRACE Sub-task 5: Setup navigation
DEBUG Generated 5 sub-tasks from intent
...
🎯 Goals & Metrics
Success Criteria
- INFO logs tell a complete story - Can understand system flow without DEBUG/TRACE
- DEBUG logs enable troubleshooting - Can diagnose issues with context
- TRACE logs show execution details - Can see step-by-step for deep debugging
- No log spam - Production logs are concise and meaningful
- Consistent patterns - Similar modules log similarly
Metrics to Track
- Lines of logs per request at INFO level: < 20
- Lines of logs per request at DEBUG level: < 100
- Lines of logs per request at TRACE level: unlimited
- Error logs include context: 100%
- WARN logs explain impact: 100%
🚀 Next Steps
- Audit current logging in all 341 files
- Prioritize modules by criticality
- Refactor module by module following this plan
- Test at each log level
- Document module-specific patterns
- Train team on logging standards
- Monitor log volume and usefulness
- Iterate based on feedback
📚 References
- Rust log crate documentation
- env_logger documentation
- Structured logging best practices
- The Log: What every software engineer should know
Remember: Good logging is like good cinematography - it should be invisible when done right, but tell a compelling story when you pay attention to it. 🎬