5.6 KiB
AI and LLM
BotServer integrates with Large Language Models (LLMs) to provide intelligent conversational capabilities and natural language understanding.
Overview
The LLM integration in BotServer enables:
- Natural language conversations
- Context-aware responses
- Tool discovery and invocation
- Document understanding
- Text generation and summarization
LLM Providers
OpenAI
Primary LLM provider with support for:
- GPT-3.5 Turbo
- GPT-4
- GPT-4 Turbo
- Custom fine-tuned models
Configuration:
OPENAI_API_KEY=your-api-key
LLM_MODEL=gpt-4
Local Models
Support for self-hosted models:
- Llama.cpp compatible servers
- Custom inference endpoints
- Privacy-preserving deployments
Configuration:
LLM_PROVIDER=local
LLM_ENDPOINT=http://localhost:8081
The LLM Keyword
Basic Usage
Basic Generation (Background Processing)
# For background/scheduled tasks only - not for interactive conversations
let summary = LLM "Explain quantum computing in simple terms"
SET_BOT_MEMORY "quantum_explanation", summary # Store for all users
With Context
Document Summarization (Background Processing)
# Scheduled task to generate summaries for all users
let document = GET "knowledge/policy.pdf"
let summary = LLM "Summarize this document: " + document
SET_BOT_MEMORY "policy_summary", summary # Available to all users
Question Answering
Context-Aware Conversations (Interactive)
# For interactive conversations - use SET CONTEXT, not LLM
TALK "What's your question?"
let question = HEAR
let context = GET_BOT_MEMORY("knowledge")
SET CONTEXT "background", context
TALK "Based on our knowledge base, here's what I can tell you..."
# System AI automatically uses the context when responding
LLM Provider Implementation
Located in src/llm/:
mod.rs- Provider trait and factoryopenai.rs- OpenAI implementationlocal.rs- Local model support
Provider Trait
All LLM providers implement:
generate()- Text generationgenerate_stream()- Streaming responsesget_embedding()- Vector embeddingscount_tokens()- Token counting
Context Management
Context Window
Managing limited context size:
- Automatic truncation
- Context compaction
- Relevance filtering
- History summarization
Context Sources
- Conversation History - Recent messages
- Knowledge Base - Relevant documents
- Bot Memory - Persistent context
- Tool Definitions - Available functions
- User Profile - Personalization
Prompt Engineering
System Prompts
Configured in bot memory:
let system_prompt = GET_BOT_MEMORY("system_prompt")
SET_CONTEXT "system" AS system_prompt
Dynamic Prompts
Building prompts programmatically:
# For interactive conversations - use SET CONTEXT
SET CONTEXT "user_name", user_name
SET CONTEXT "current_date", NOW()
# System AI automatically incorporates this context
Streaming Responses
WebSocket Streaming
Real-time token streaming:
- LLM generates tokens
- Tokens sent via WebSocket
- UI updates progressively
- Complete response assembled
Stream Control
- Start/stop generation
- Cancel long responses
- Timeout protection
- Error recovery
Embeddings
Vector Generation
Creating embeddings for semantic search:
let embedding = llm_provider.get_embedding(text).await?;
Embedding Models
- OpenAI: text-embedding-ada-002
- Local: Sentence transformers
- Custom: Configurable models
Token Management
Token Counting
Estimating usage before calls:
let token_count = llm_provider.count_tokens(prompt)?;
Token Limits
- Model-specific limits
- Context window constraints
- Rate limiting
- Cost management
Error Handling
Common Errors
- API key invalid
- Rate limit exceeded
- Context too long
- Model unavailable
- Network timeout
Fallback Strategies
- Retry with backoff
- Switch to backup model
- Reduce context size
- Cache responses
- Return graceful error
Performance Optimization
Caching
- Response caching
- Embedding caching
- Token count caching
- Semantic cache
Batching
- Group similar requests
- Batch embeddings
- Parallel processing
- Queue management
Cost Management
Usage Tracking
- Tokens per request
- Cost per conversation
- Daily/monthly limits
- Per-user quotas
Optimization Strategies
- Use smaller models when possible
- Cache frequent queries
- Compress context
- Limit conversation length
- Implement quotas
Security Considerations
API Key Management
- Never hardcode keys
- Use environment variables
- Rotate keys regularly
- Monitor usage
Content Filtering
- Input validation
- Output sanitization
- PII detection
- Inappropriate content blocking
Monitoring
Metrics
- Response time
- Token usage
- Error rate
- Cache hit rate
- Model performance
Logging
- Request/response pairs
- Error details
- Performance metrics
- Usage statistics
Best Practices
- Choose Appropriate Models: Balance cost and capability
- Optimize Prompts: Clear, concise instructions
- Manage Context: Keep relevant information only
- Handle Errors: Graceful degradation
- Monitor Usage: Track costs and performance
- Cache Wisely: Reduce redundant calls
- Stream When Possible: Better user experience
Summary
The LLM integration is central to BotServer's intelligence, providing natural language understanding and generation capabilities. Through careful prompt engineering, context management, and provider abstraction, bots can deliver sophisticated conversational experiences while managing costs and performance.