- ADD SUGGESTION TOOL "name" WITH params AS "text" for pre-filled
params
- Add secrets module for Vault integration with minimal .env approach
- Update LLM providers documentation with model recommendations
- Refactor template dialogs for consistency:
- Use PARAM with proper types and DESCRIPTION
- Use WITH blocks for structured data
- Simplify TALK messages (remove emoji prefixes)
- Add RETURN statements to tools
- Add proper CLEAR SUGGESTIONS and ADD TOOL patterns
- Add analytics-dashboard template demonstrating KB Statistics usage ```
11 KiB
Observability
General Bots uses a comprehensive observability stack for monitoring, logging, and metrics collection. This chapter explains how logging works and how Vector integrates without requiring code changes.
Architecture Overview
Vector Agent collects logs from BotServer without requiring any code changes.
No Code Changes Required
You do NOT need to replace log::trace!(), log::info!(), log::error!() calls.
Vector works by:
- Tailing log files - Reads from
./botserver-stack/logs/ - Parsing log lines - Extracts level, timestamp, message
- Routing by level - Sends errors to alerts, metrics to InfluxDB
- Enriching data - Adds hostname, service name, etc.
Log directory structure:
logs/system/- BotServer application logslogs/drive/- MinIO logslogs/tables/- PostgreSQL logslogs/cache/- Redis logslogs/llm/- LLM server logslogs/email/- Stalwart logslogs/directory/- Zitadel logslogs/vectordb/- Qdrant logslogs/meet/- LiveKit logslogs/alm/- Forgejo logs
This approach:
- Requires zero code changes
- Works with existing logging
- Can be added/removed without recompilation
- Scales independently from the application
Vector Configuration
Installation
Vector is installed as the observability component:
./botserver install observability
Configuration File
Configuration is at ./botserver-stack/conf/monitoring/vector.toml:
# Vector Configuration for General Bots
# Collects logs without requiring code changes
# Component: observability (Vector)
# Config: ./botserver-stack/conf/monitoring/vector.toml
#
# SOURCES - Where logs come from
#
[sources.botserver_logs]
type = "file"
include = ["./botserver-stack/logs/system/*.log"]
read_from = "beginning"
[sources.drive_logs]
type = "file"
include = ["./botserver-stack/logs/drive/*.log"]
read_from = "beginning"
[sources.tables_logs]
type = "file"
include = ["./botserver-stack/logs/tables/*.log"]
read_from = "beginning"
[sources.cache_logs]
type = "file"
include = ["./botserver-stack/logs/cache/*.log"]
read_from = "beginning"
[sources.llm_logs]
type = "file"
include = ["./botserver-stack/logs/llm/*.log"]
read_from = "beginning"
[sources.service_logs]
type = "file"
include = [
"./botserver-stack/logs/email/*.log",
"./botserver-stack/logs/directory/*.log",
"./botserver-stack/logs/vectordb/*.log",
"./botserver-stack/logs/meet/*.log",
"./botserver-stack/logs/alm/*.log"
]
read_from = "beginning"
#
# TRANSFORMS - Parse and enrich logs
#
[transforms.parse_botserver]
type = "remap"
inputs = ["botserver_logs"]
source = '''
# Parse standard log format: [TIMESTAMP LEVEL target] message
. = parse_regex!(.message, r'^(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z?)\s+(?P<level>\w+)\s+(?P<target>\S+)\s+(?P<message>.*)$')
# Convert timestamp
.timestamp = parse_timestamp!(.timestamp, "%Y-%m-%dT%H:%M:%S%.fZ")
# Normalize level
.level = downcase!(.level)
# Add service name
.service = "botserver"
# Extract session_id if present
if contains(string!(.message), "session") {
session_match = parse_regex(.message, r'session[:\s]+(?P<session_id>[a-f0-9-]+)') ?? {}
if exists(session_match.session_id) {
.session_id = session_match.session_id
}
}
# Extract user_id if present
if contains(string!(.message), "user") {
user_match = parse_regex(.message, r'user[:\s]+(?P<user_id>[a-f0-9-]+)') ?? {}
if exists(user_match.user_id) {
.user_id = user_match.user_id
}
}
'''
[transforms.parse_service_logs]
type = "remap"
inputs = ["service_logs"]
source = '''
# Basic parsing for service logs
.timestamp = now()
.level = "info"
# Detect errors
if contains(string!(.message), "ERROR") || contains(string!(.message), "error") {
.level = "error"
}
if contains(string!(.message), "WARN") || contains(string!(.message), "warn") {
.level = "warn"
}
# Extract service name from file path
.service = replace(string!(.file), r'.*/(\w+)\.log$', "$1")
'''
#
# FILTERS - Route by log level
#
[transforms.filter_errors]
type = "filter"
inputs = ["parse_botserver", "parse_service_logs"]
condition = '.level == "error"'
[transforms.filter_warnings]
type = "filter"
inputs = ["parse_botserver", "parse_service_logs"]
condition = '.level == "warn"'
[transforms.filter_info]
type = "filter"
inputs = ["parse_botserver"]
condition = '.level == "info" || .level == "debug"'
#
# METRICS - Convert logs to metrics
#
[transforms.log_to_metrics]
type = "log_to_metric"
inputs = ["parse_botserver"]
[[transforms.log_to_metrics.metrics]]
type = "counter"
field = "level"
name = "log_events_total"
tags.level = "{{level}}"
tags.service = "{{service}}"
[[transforms.log_to_metrics.metrics]]
type = "counter"
field = "message"
name = "errors_total"
tags.service = "{{service}}"
increment_by_value = false
#
# SINKS - Where logs go
#
# All logs to file (backup)
[sinks.file_backup]
type = "file"
inputs = ["parse_botserver", "parse_service_logs"]
path = "./botserver-stack/logs/vector/all-%Y-%m-%d.log"
encoding.codec = "json"
# Metrics to InfluxDB
[sinks.influxdb]
type = "influxdb_metrics"
inputs = ["log_to_metrics"]
endpoint = "http://localhost:8086"
org = "pragmatismo"
bucket = "metrics"
token = "${INFLUXDB_TOKEN}"
# Errors to alerting (webhook)
[sinks.alert_webhook]
type = "http"
inputs = ["filter_errors"]
uri = "http://localhost:8080/api/admin/alerts"
method = "post"
encoding.codec = "json"
# Console output (for debugging)
[sinks.console]
type = "console"
inputs = ["filter_errors"]
encoding.codec = "text"
Log Format
BotServer uses the standard Rust log crate format:
2024-01-15T10:30:45.123Z INFO botserver::core::bot Processing message for session: abc-123
2024-01-15T10:30:45.456Z DEBUG botserver::llm::cache Cache hit for prompt hash: xyz789
2024-01-15T10:30:45.789Z ERROR botserver::drive::upload Failed to upload file: permission denied
Vector parses this automatically without code changes.
Metrics Collection
Automatic Metrics
Vector converts log events to metrics:
| Metric | Description |
|---|---|
log_events_total |
Total log events by level |
errors_total |
Error count by service |
warnings_total |
Warning count by service |
Application Metrics
BotServer also exposes metrics via /api/metrics (Prometheus format):
# HELP botserver_sessions_active Current active sessions
# TYPE botserver_sessions_active gauge
botserver_sessions_active 42
# HELP botserver_messages_total Total messages processed
# TYPE botserver_messages_total counter
botserver_messages_total{channel="web"} 1234
botserver_messages_total{channel="whatsapp"} 567
# HELP botserver_llm_latency_seconds LLM response latency
# TYPE botserver_llm_latency_seconds histogram
botserver_llm_latency_seconds_bucket{le="0.5"} 100
botserver_llm_latency_seconds_bucket{le="1.0"} 150
botserver_llm_latency_seconds_bucket{le="2.0"} 180
Vector can scrape these directly:
[sources.prometheus_metrics]
type = "prometheus_scrape"
endpoints = ["http://localhost:8080/api/metrics"]
scrape_interval_secs = 15
Alerting
Error Alerts
Vector sends errors to a webhook for alerting:
[sinks.alert_webhook]
type = "http"
inputs = ["filter_errors"]
uri = "http://localhost:8080/api/admin/alerts"
method = "post"
encoding.codec = "json"
Slack Integration
[sinks.slack_alerts]
type = "http"
inputs = ["filter_errors"]
uri = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
method = "post"
encoding.codec = "json"
[sinks.slack_alerts.request]
headers.content-type = "application/json"
[sinks.slack_alerts.encoding]
codec = "json"
Email Alerts
Use with an SMTP relay or webhook-to-email service:
[sinks.email_alerts]
type = "http"
inputs = ["filter_errors"]
uri = "http://localhost:8025/api/send"
method = "post"
encoding.codec = "json"
Grafana Dashboards
Pre-built Dashboard
Import the General Bots dashboard from templates/grafana-dashboard.json:
- Open Grafana at
http://localhost:3000 - Go to Dashboards → Import
- Upload
grafana-dashboard.json - Select InfluxDB data source
Key Panels
| Panel | Query |
|---|---|
| Active Sessions | from(bucket:"metrics") |> filter(fn: (r) => r._measurement == "sessions_active") |
| Messages/Minute | from(bucket:"metrics") |> filter(fn: (r) => r._measurement == "messages_total") |> derivative() |
| Error Rate | from(bucket:"metrics") |> filter(fn: (r) => r.level == "error") |> count() |
| LLM Latency P95 | from(bucket:"metrics") |> filter(fn: (r) => r._measurement == "llm_latency") |> quantile(q: 0.95) |
Configuration Options
config.csv Settings
# Observability settings
observability-enabled,true
observability-log-level,info
observability-metrics-endpoint,/api/metrics
observability-vector-enabled,true
Log Levels
| Level | When to Use |
|---|---|
error |
Something failed, requires attention |
warn |
Unexpected but handled, worth noting |
info |
Normal operations, key events |
debug |
Detailed flow, development |
trace |
Very detailed, performance impact |
Set in config.csv:
log-level,info
Or environment:
RUST_LOG=info ./botserver
Troubleshooting
Vector Not Collecting Logs
# Check Vector status
systemctl status gbo-observability
# View Vector logs
journalctl -u gbo-observability -f
# Test configuration
vector validate ./botserver-stack/conf/monitoring/vector.toml
Missing Metrics in InfluxDB
# Check InfluxDB connection
curl http://localhost:8086/health
# Verify bucket exists
influx bucket list
# Check Vector sink status
vector top
High Log Volume
If logs are too verbose:
- Increase log level in config.csv
- Add filters in Vector to drop debug logs
- Set retention policies in InfluxDB
# Drop debug logs before sending to InfluxDB
[transforms.drop_debug]
type = "filter"
inputs = ["parse_botserver"]
condition = '.level != "debug" && .level != "trace"'
Best Practices
1. Don't Log Sensitive Data
// Bad
log::info!("User password: {}", password);
// Good
log::info!("User {} authenticated successfully", user_id);
2. Use Structured Context
// Better for parsing
log::info!("session={} user={} action=message_sent", session_id, user_id);
3. Set Appropriate Levels
// Errors: things that failed
log::error!("Database connection failed: {}", err);
// Warnings: unusual but handled
log::warn!("Retrying LLM request after timeout");
// Info: normal operations
log::info!("Session {} started", session_id);
// Debug: development details
log::debug!("Cache lookup for key: {}", key);
// Trace: very detailed
log::trace!("Entering function process_message");
4. Keep Vector Config Simple
Start with basic collection, add transforms as needed.
Summary
- No code changes needed - Vector collects from log files
- Keep using log macros -
log::info!(),log::error!(), etc. - Vector handles routing - Errors to alerts, all to storage
- InfluxDB for metrics - Time-series storage and queries
- Grafana for dashboards - Visualize everything
Next Steps
- Scaling and Load Balancing - Scale observability with your cluster
- Infrastructure Design - Full architecture overview
- Monitoring Dashboard - Built-in monitoring UI