botserver/LOGGING_PLAN.md
Rodrigo Rodriguez (Pragmatismo) 1e71c9be09
Some checks failed
BotServer CI / build (push) Failing after 2m39s
feat: Add comprehensive stage progress logging
- Add detailed logging for all 5 pipeline stages (PLAN, BUILD, REVIEW, DEPLOY, MONITOR)
- Log stage start/complete events with agent IDs and progress details
- Add resource creation/deletion logging in drive_handlers
- Improve pipeline summary logging with task ID, nodes, resources, and URL

This addresses the requirement for textual progress in console logs.
2026-03-01 22:36:14 -03:00

18 KiB

BotServer Cinema Viewer Logging Plan

🎬 The Cinema Viewer Philosophy

Think of your logs as a movie with different viewing modes:

  • INFO = The Movie - Watch the main story unfold (production-ready)
  • DEBUG = Director's Commentary - Behind-the-scenes details (troubleshooting)
  • TRACE = Raw Footage - Every single take and detail (deep debugging)
  • WARN = Plot Holes - Things that shouldn't happen but are recoverable
  • ERROR = Scene Failures - Critical issues that break the narrative

📊 Log Level Definitions

ERROR - Critical Failures

When to use: System cannot proceed, requires immediate attention

// ✅ GOOD - Actionable, clear context
error!("Database connection failed - retrying in 5s: {}", e);
error!("Authentication failed for user {}: invalid credentials", user_id);
error!("Stage 2 BUILD failed: {}", e);

// ❌ BAD - Vague, no context
error!("Error!");
error!("Failed");
error!("Something went wrong: {}", e);

WARN - Recoverable Issues

When to use: Unexpected state but system can continue

// ✅ GOOD - Explains the issue and impact
warn!("Failed to create data directory: {}. Using fallback.", e);
warn!("LLM server not ready - deferring embedding generation");
warn!("Rate limit approaching for API key {}", key_id);

// ❌ BAD - Not actionable
warn!("Warning");
warn!("Something happened");

INFO - The Main Story

When to use: Key events, state changes, business milestones

// ✅ GOOD - Tells a story, shows progress
info!("Pipeline starting - task: {}, intent: {}", task_id, intent);
info!("Stage 1 PLAN complete - {} nodes planned", node_count);
info!("User {} logged in from {}", user_id, ip_address);
info!("Server started on port {}", port);

// ❌ BAD - Too verbose, implementation details
info!("Entering function process_request");
info!("Variable x = {}", x);
info!("Loop iteration {}", i);

DEBUG - Behind the Scenes

When to use: Troubleshooting information, decision points, state inspections

// ✅ GOOD - Helps diagnose issues
debug!("Request payload: {:?}", payload);
debug!("Using cache key: {}", cache_key);
debug!("Retry attempt {} of {}", attempt, max_retries);
debug!("Selected LLM model: {} for task type: {}", model, task_type);

// ❌ BAD - Too trivial
debug!("Variable assigned");
debug!("Function called");

TRACE - Raw Footage

When to use: Step-by-step execution, loop iterations, detailed flow

// ✅ GOOD - Detailed execution path
trace!("Starting monitoring loop");
trace!("Processing file: {:?}", path);
trace!("Checking bot directory: {}", dir);
trace!("WebSocket message received: {} bytes", len);

// ❌ BAD - Noise without value
trace!("Line 100");
trace!("Got here");
trace!("...");

🎭 Logging Patterns by Module Type

1. Orchestration & Pipeline Modules

Examples: auto_task/orchestrator.rs, auto_task/agent_executor.rs

INFO Level: Show the story arc

info!("Pipeline starting - task: {}, intent: {}", task_id, intent);
info!("Stage 1 PLAN starting - Agent #1 analyzing request");
info!("Stage 1 PLAN complete - {} nodes planned", node_count);
info!("Stage 2 BUILD complete - {} resources, url: {}", count, url);
info!("Pipeline complete - task: {}, nodes: {}, resources: {}", task_id, nodes, resources);

DEBUG Level: Show decision points

debug!("Classified intent as: {:?}", classification);
debug!("Selected app template: {}", template_name);
debug!("Skipping stage 3 - no resources to review");

TRACE Level: Show execution flow

trace!("Broadcasting thought to UI: {}", thought);
trace!("Updating agent {} status: {} -> {}", id, old, new);
trace!("Sub-task generated: {}", task_name);

2. File Monitoring & Compilation

Examples: drive/local_file_monitor.rs, drive/drive_monitor/mod.rs

INFO Level: Key file operations

info!("Local file monitor started - watching /opt/gbo/data/*.gbai");
info!("Compiling bot: {} ({} files)", bot_name, file_count);
info!("Bot {} compiled successfully - {} tools generated", bot_name, tool_count);

DEBUG Level: File processing details

debug!("Detected change in: {:?}", path);
debug!("Recompiling {} - modification detected", bot_name);
debug!("Skipping {} - no changes detected", bot_name);

TRACE Level: File system operations

trace!("Scanning directory: {:?}", dir);
trace!("File modified: {:?} at {:?}", path, time);
trace!("Watching directory: {:?}", path);

3. Security & Authentication

Examples: security/jwt.rs, security/api_keys.rs, security/sql_guard.rs

INFO Level: Security events (always log these!)

info!("User {} logged in from {}", user_id, ip);
info!("API key {} created for user {}", key_id, user_id);
info!("Failed login attempt for user {} from {}", username, ip);
info!("Rate limit exceeded for IP: {}", ip);

DEBUG Level: Security checks

debug!("Validating JWT for user {}", user_id);
debug!("Checking API key permissions: {:?}", permissions);
debug!("SQL query sanitized: {}", safe_query);

TRACE Level: Security internals

trace!("Token expiry check: {} seconds remaining", remaining);
trace!("Permission check: {} -> {}", resource, allowed);
trace!("Hashing password with cost factor: {}", cost);

4. API Handlers

Examples: HTTP endpoint handlers in core/, drive/, etc.

INFO Level: Request lifecycle

info!("Request started: {} {} from {}", method, path, ip);
info!("Request completed: {} {} -> {} ({}ms)", method, path, status, duration);
info!("User {} created resource: {}", user_id, resource_id);

DEBUG Level: Request details

debug!("Request headers: {:?}", headers);
debug!("Request body: {:?}", body);
debug!("Response payload: {} bytes", size);

TRACE Level: Request processing

trace!("Parsing JSON body");
trace!("Validating request parameters");
trace!("Serializing response");

5. Database Operations

Examples: core/shared/models/, Diesel queries

INFO Level: Database lifecycle

info!("Database connection pool initialized ({} connections)", pool_size);
info!("Migration completed - {} tables updated", count);
info!("Database backup created: {}", backup_path);

DEBUG Level: Query information

debug!("Executing query: {}", query);
debug!("Query returned {} rows in {}ms", count, duration);
debug!("Cache miss for key: {}", key);

TRACE Level: Query details

trace!("Preparing statement: {}", sql);
trace!("Binding parameter {}: {:?}", index, value);
trace!("Fetching next row");

6. LLM & AI Operations

Examples: llm/, core/kb/

INFO Level: LLM operations

info!("LLM request started - model: {}, tokens: {}", model, estimated_tokens);
info!("LLM response received - {} tokens, {}ms", tokens, duration);
info!("Embedding generated - {} dimensions", dimensions);
info!("Knowledge base indexed - {} documents", doc_count);

DEBUG Level: LLM details

debug!("LLM prompt: {}", prompt_preview);
debug!("Using temperature: {}, max_tokens: {}", temp, max);
debug!("Selected model variant: {}", variant);

TRACE Level: LLM internals

trace!("Sending request to LLM API: {}", url);
trace!("Streaming token: {}", token);
trace!("Parsing LLM response chunk");

7. Startup & Initialization

Examples: main.rs, main_module/bootstrap.rs

INFO Level: Startup milestones

info!("Server starting on port {}", port);
info!("Database initialized - PostgreSQL connected");
info!("Cache initialized - Valkey connected");
info!("Secrets loaded from Vault");
info!("BotServer ready - {} bots loaded", bot_count);

DEBUG Level: Configuration details

debug!("Using config: {:?}", config);
debug!("Environment: {}", env);
debug!("Feature flags: {:?}", features);

TRACE Level: Initialization steps

trace!("Loading .env file");
trace!("Setting up signal handlers");
trace!("Initializing thread registry");

🎯 The Cinema Viewer Experience

Level 1: Watching the Movie (INFO)

RUST_LOG=botserver=info

What you see:

INFO botserver: Server starting on port 8080
INFO botserver: Database initialized - PostgreSQL connected
INFO botserver: User alice@example.com logged in from 192.168.1.100
INFO botserver::auto_task::orchestrator: Pipeline starting - task: abc123, intent: Create CRM
INFO botserver::auto_task::orchestrator: Stage 1 PLAN complete - 5 nodes planned
INFO botserver::auto_task::orchestrator: Stage 2 BUILD complete - 12 resources, url: /apps/crm
INFO botserver::auto_task::orchestrator: Pipeline complete - task: abc123, nodes: 5, resources: 12
INFO botserver: User alice@example.com logged out

Perfect for: Production monitoring, understanding system flow

Level 2: Director's Commentary (DEBUG)

RUST_LOG=botserver=debug

What you see: Everything from INFO plus:

DEBUG botserver::auto_task::orchestrator: Classified intent as: AppGeneration
DEBUG botserver::auto_task::orchestrator: Selected app template: crm
DEBUG botserver::security::jwt: Validating JWT for user alice@example.com
DEBUG botserver::drive::local_file_monitor: Detected change in: /opt/gbo/data/crm.gbai
DEBUG botserver::llm: Using temperature: 0.7, max_tokens: 2000

Perfect for: Troubleshooting issues, understanding decisions

Level 3: Raw Footage (TRACE)

RUST_LOG=botserver=trace

What you see: Everything from DEBUG plus:

TRACE botserver::drive::local_file_monitor: Scanning directory: /opt/gbo/data
TRACE botserver::auto_task::orchestrator: Broadcasting thought to UI: Analyzing...
TRACE botserver::llm: Streaming token: Create
TRACE botserver::llm: Streaming token: a
TRACE botserver::llm: Streaming token: CRM
TRACE botserver::core::db: Preparing statement: SELECT * FROM bots

Perfect for: Deep debugging, performance analysis, finding bugs


Best Practices

1. Tell a Story

// ✅ GOOD - Shows the narrative
info!("Pipeline starting - task: {}", task_id);
info!("Stage 1 PLAN complete - {} nodes planned", nodes);
info!("Stage 2 BUILD complete - {} resources", resources);
info!("Pipeline complete - app deployed at {}", url);

// ❌ BAD - Just data points
info!("Task started");
info!("Nodes: {}", nodes);
info!("Resources: {}", resources);
info!("Done");

2. Use Structured Data

// ✅ GOOD - Easy to parse and filter
info!("User {} logged in from {}", user_id, ip);
info!("Request completed: {} {} -> {} ({}ms)", method, path, status, duration);

// ❌ BAD - Hard to parse
info!("User login happened");
info!("Request finished successfully");

3. Include Context

// ✅ GOOD - Provides context
error!("Database connection failed for bot {}: {}", bot_id, e);
warn!("Rate limit approaching for user {}: {}/{} requests", user_id, count, limit);

// ❌ BAD - No context
error!("Connection failed: {}", e);
warn!("Rate limit warning");

4. Use Appropriate Levels

// ✅ GOOD - Right level for right information
info!("Server started on port {}", port);          // Key event
debug!("Using config: {:?}", config);              // Troubleshooting
trace!("Listening on socket {:?}", socket);        // Deep detail

// ❌ BAD - Wrong levels
trace!("Server started");                           // Too important for trace
info!("Loop iteration {}", i);                      // Too verbose for info
error!("Variable is null");                         // Not an error

5. Avoid Noise

// ✅ GOOD - Meaningful information
debug!("Retry attempt {} of {} for API call", attempt, max);

// ❌ BAD - Just noise
debug!("Entering function");
debug!("Exiting function");
debug!("Variable assigned");

6. Log State Changes

// ✅ GOOD - Shows what changed
info!("User {} role changed: {} -> {}", user_id, old_role, new_role);
info!("Bot {} status: {} -> {}", bot_id, old_status, new_status);

// ❌ BAD - No before/after
info!("User role updated");
info!("Bot status changed");

7. Include Timings for Operations

// ✅ GOOD - Performance visibility
info!("Database migration completed in {}ms", duration);
info!("LLM response received - {} tokens, {}ms", tokens, duration);
debug!("Query executed in {}ms", duration);

// ❌ BAD - No performance data
info!("Migration completed");
info!("LLM response received");

🔧 Implementation Guide

Step 1: Audit Current Logging

# Find all logging statements
find botserver/src -name "*.rs" -exec grep -n "info!\|debug!\|trace!\|warn!\|error!" {} +

# Count by level
grep -r "info!" botserver/src | wc -l
grep -r "debug!" botserver/src | wc -l
grep -r "trace!" botserver/src | wc -l

Step 2: Categorize by Module

Create a spreadsheet or document listing:

  • Module name
  • Current log levels used
  • Purpose of the module
  • What story should it tell

Step 3: Refactor Module by Module

Start with critical path modules:

  1. auto_task/orchestrator.rs - Already done!
  2. drive/local_file_monitor.rs - File operations
  3. security/jwt.rs - Authentication events
  4. main.rs - Startup sequence
  5. core/bot/ - Bot lifecycle

Step 4: Test Different Verbosity Levels

# Test INFO level (production)
RUST_LOG=botserver=info cargo run

# Test DEBUG level (troubleshooting)
RUST_LOG=botserver=debug cargo run

# Test TRACE level (development)
RUST_LOG=botserver=trace cargo run

Step 5: Document Module-Specific Patterns

For each module, document:

  • What story does it tell at INFO level?
  • What troubleshooting info at DEBUG level?
  • What raw details at TRACE level?

📋 Quick Reference Card

Log Level Decision Tree

Is this a failure that stops execution?
  └─ YES → ERROR
  └─ NO → Is this unexpected but recoverable?
           └─ YES → WARN
           └─ NO → Is this a key business event?
                    └─ YES → INFO
                    └─ NO → Is this useful for troubleshooting?
                             └─ YES → DEBUG
                             └─ NO → Is this step-by-step execution detail?
                                      └─ YES → TRACE
                                      └─ NO → Don't log it!

Module-Specific Cheat Sheet

Module Type INFO DEBUG TRACE
Orchestration Stage start/complete, pipeline milestones Decision points, classifications UI broadcasts, state changes
File Monitoring Monitor start, bot compiled Changes detected, recompiles File scans, timestamps
Security Logins, key events, failures Validations, permission checks Token details, hash operations
API Handlers Request start/end, resource changes Headers, payloads JSON parsing, serialization
Database Connections, migrations Queries, row counts Statement prep, row fetching
LLM Requests, responses, indexing Prompts, parameters Token streaming, chunking
Startup Service ready, milestones Config, environment Init steps, signal handlers

🎬 Example: Complete Pipeline Logging

Here's how a complete auto-task pipeline looks at different levels:

INFO Level (The Movie)

INFO Pipeline starting - task: task-123, intent: Create a CRM system
INFO Stage 1 PLAN starting - Agent #1 analyzing request
INFO Stage 1 PLAN complete - 5 nodes planned
INFO Stage 2 BUILD starting - Agent #2 generating code
INFO Stage 2 BUILD complete - 12 resources, url: /apps/crm-system
INFO Stage 3 REVIEW starting - Agent #3 checking code quality
INFO Stage 3 REVIEW complete - all checks passed
INFO Stage 4 DEPLOY starting - Agent #4 deploying to /apps/crm-system
INFO Stage 4 DEPLOY complete - app live at /apps/crm-system
INFO Stage 5 MONITOR starting - Agent #1 setting up monitoring
INFO Stage 5 MONITOR complete - monitoring active
INFO Pipeline complete - task: task-123, nodes: 5, resources: 12, url: /apps/crm-system

DEBUG Level (Director's Commentary)

INFO Pipeline starting - task: task-123, intent: Create a CRM system
DEBUG Classified intent as: AppGeneration
DEBUG Selected app template: crm_standard
INFO Stage 1 PLAN starting - Agent #1 analyzing request
DEBUG Generated 5 sub-tasks from intent
INFO Stage 1 PLAN complete - 5 nodes planned
INFO Stage 2 BUILD starting - Agent #2 generating code
DEBUG Using database schema: contacts, deals, activities
DEBUG Generated 3 tables, 8 pages, 1 tool
INFO Stage 2 BUILD complete - 12 resources, url: /apps/crm-system
...

TRACE Level (Raw Footage)

INFO Pipeline starting - task: task-123, intent: Create a CRM system
DEBUG Classified intent as: AppGeneration
TRACE Extracting entities from: "Create a CRM system"
TRACE Found entity: CRM
TRACE Found entity: system
DEBUG Selected app template: crm_standard
INFO Stage 1 PLAN starting - Agent #1 analyzing request
TRACE Broadcasting thought to UI: Analyzing request...
TRACE Deriving plan sub-tasks
TRACE Sub-task 1: Create database schema
TRACE Sub-task 2: Generate list page
TRACE Sub-task 3: Generate form pages
TRACE Sub-task 4: Create BASIC tools
TRACE Sub-task 5: Setup navigation
DEBUG Generated 5 sub-tasks from intent
...

🎯 Goals & Metrics

Success Criteria

  1. INFO logs tell a complete story - Can understand system flow without DEBUG/TRACE
  2. DEBUG logs enable troubleshooting - Can diagnose issues with context
  3. TRACE logs show execution details - Can see step-by-step for deep debugging
  4. No log spam - Production logs are concise and meaningful
  5. Consistent patterns - Similar modules log similarly

Metrics to Track

  • Lines of logs per request at INFO level: < 20
  • Lines of logs per request at DEBUG level: < 100
  • Lines of logs per request at TRACE level: unlimited
  • Error logs include context: 100%
  • WARN logs explain impact: 100%

🚀 Next Steps

  1. Audit current logging in all 341 files
  2. Prioritize modules by criticality
  3. Refactor module by module following this plan
  4. Test at each log level
  5. Document module-specific patterns
  6. Train team on logging standards
  7. Monitor log volume and usefulness
  8. Iterate based on feedback

📚 References


Remember: Good logging is like good cinematography - it should be invisible when done right, but tell a compelling story when you pay attention to it. 🎬