Rodrigo Rodriguez (Pragmatismo) 49eb6696ea Reorganize chapters and add admin/user views documentation

Chapter renames:
- chapter-09-api -> chapter-09-tools (LLM Tools)
- chapter-10-api -> chapter-10-rest (REST Endpoints)

New documentation:
- chapter-04-gbui/admin-user-views.md: Complete guide to User vs Admin interfaces
  - User Settings (/api/user/*): profile, security, notifications, storage
  - Admin Panel (/api/admin/*): users, groups, bots, DNS, audit
  - Permission levels: guest, user, manager, admin
  - Desktop sync considerations

Updated:
- Drive app docs with sync feature (rclone, desktop-only)
- All cross-references to renamed chapters
- SUMMARY.md with new structure and admin-user-views entry

2025-12-05 06:50:56 -03:00

11 KiB

Raw Blame History

LLM Providers

General Bots supports multiple Large Language Model (LLM) providers, both cloud-based services and local deployments. This guide helps you choose the right provider for your use case.

Overview

LLMs are the intelligence behind General Bots' conversational capabilities. You can configure:

Cloud Providers - External APIs (OpenAI, Anthropic, Groq, etc.)
Local Models - Self-hosted models via llama.cpp
Hybrid - Use local for simple tasks, cloud for complex reasoning

Cloud Providers

OpenAI (GPT Series)

The most widely known LLM provider, offering GPT-4 and GPT-4o models.

Model	Context	Best For	Speed
GPT-4o	128K	General purpose, vision	Fast
GPT-4o-mini	128K	Cost-effective tasks	Very Fast
GPT-4 Turbo	128K	Complex reasoning	Medium
o1-preview	128K	Advanced reasoning, math	Slow
o1-mini	128K	Code, logic tasks	Medium

Configuration:

llm-provider,openai
llm-api-key,sk-xxxxxxxxxxxxxxxxxxxxxxxx
llm-model,gpt-4o

Strengths:

Excellent general knowledge
Strong code generation
Good instruction following
Vision capabilities (GPT-4o)

Considerations:

API costs can add up
Data sent to external servers
Rate limits apply

Anthropic (Claude Series)

Known for safety, helpfulness, and large context windows.

Model	Context	Best For	Speed
Claude 3.5 Sonnet	200K	Best balance of capability/speed	Fast
Claude 3.5 Haiku	200K	Quick, everyday tasks	Very Fast
Claude 3 Opus	200K	Most capable, complex tasks	Slow

Configuration:

llm-provider,anthropic
llm-api-key,sk-ant-xxxxxxxxxxxxxxxx
llm-model,claude-3-5-sonnet-20241022

Strengths:

Largest context window (200K tokens)
Excellent at following complex instructions
Strong coding abilities
Better at refusing harmful requests

Considerations:

Premium pricing
No vision in all models
Newer provider, smaller ecosystem

Groq

Ultra-fast inference using custom LPU hardware. Offers open-source models at high speed.

Model	Context	Best For	Speed
Llama 3.3 70B	128K	Complex reasoning	Very Fast
Llama 3.1 8B	128K	Quick responses	Extremely Fast
Mixtral 8x7B	32K	Balanced performance	Very Fast
Gemma 2 9B	8K	Lightweight tasks	Extremely Fast

Configuration:

llm-provider,groq
llm-api-key,gsk_xxxxxxxxxxxxxxxx
llm-model,llama-3.3-70b-versatile

Strengths:

Fastest inference speeds (500+ tokens/sec)
Competitive pricing
Open-source models
Great for real-time applications

Considerations:

Limited model selection
Rate limits on free tier
Models may be less capable than GPT-4/Claude

Google (Gemini Series)

Google's multimodal AI models with strong reasoning capabilities.

Model	Context	Best For	Speed
Gemini 1.5 Pro	2M	Extremely long documents	Medium
Gemini 1.5 Flash	1M	Fast multimodal	Fast
Gemini 2.0 Flash	1M	Latest capabilities	Fast

Configuration:

llm-provider,google
llm-api-key,AIzaxxxxxxxxxxxxxxxx
llm-model,gemini-1.5-pro

Strengths:

Largest context window (2M tokens)
Native multimodal (text, image, video, audio)
Strong at structured data
Good coding abilities

Considerations:

Newer ecosystem
Some features region-limited
API changes more frequently

Mistral AI

European AI company offering efficient, open-weight models.

Model	Context	Best For	Speed
Mistral Large	128K	Complex tasks	Medium
Mistral Medium	32K	Balanced performance	Fast
Mistral Small	32K	Cost-effective	Very Fast
Codestral	32K	Code generation	Fast

Configuration:

llm-provider,mistral
llm-api-key,xxxxxxxxxxxxxxxx
llm-model,mistral-large-latest

Strengths:

European data sovereignty (GDPR)
Excellent code generation (Codestral)
Open-weight models available
Competitive pricing

Considerations:

Smaller context than competitors
Less brand recognition
Fewer fine-tuning options

DeepSeek

Chinese AI company known for efficient, capable models.

Model	Context	Best For	Speed
DeepSeek-V3	128K	General purpose	Fast
DeepSeek-R1	128K	Reasoning, math	Medium
DeepSeek-Coder	128K	Programming	Fast

Configuration:

llm-provider,deepseek
llm-api-key,sk-xxxxxxxxxxxxxxxx
llm-model,deepseek-chat
llm-server-url,https://api.deepseek.com

Strengths:

Extremely cost-effective
Strong reasoning (R1 model)
Excellent code generation
Open-weight versions available

Considerations:

Data processed in China
Newer provider
May have content restrictions

Local Models

Run models on your own hardware for privacy, cost control, and offline operation.

Setting Up Local LLM

General Bots uses llama.cpp server for local inference:

llm-provider,local
llm-server-url,https://localhost:8081
llm-model,DeepSeek-R1-Distill-Qwen-1.5B

Recommended Local Models

For High-End GPU (24GB+ VRAM)

Model	Size	VRAM	Quality
GPT-OSS 120B Q4	70GB	48GB+	Excellent
Llama 3.1 70B Q4	40GB	48GB+	Excellent
DeepSeek-R1 32B Q4	20GB	24GB	Very Good
Qwen 2.5 72B Q4	42GB	48GB+	Excellent

For Mid-Range GPU (12-16GB VRAM)

Model	Size	VRAM	Quality
GPT-OSS 20B F16	40GB	16GB	Very Good
Llama 3.1 8B Q8	9GB	12GB	Good
DeepSeek-R1-Distill 14B Q4	8GB	12GB	Good
Mistral Nemo 12B Q4	7GB	10GB	Good

For Small GPU or CPU (8GB VRAM or less)

Model	Size	VRAM	Quality
DeepSeek-R1-Distill 1.5B Q4	1GB	4GB	Basic
Phi-3 Mini 3.8B Q4	2.5GB	6GB	Acceptable
Gemma 2 2B Q8	3GB	6GB	Acceptable
Qwen 2.5 3B Q4	2GB	4GB	Basic

Model Download URLs

Add models to installer.rs data_download_list:

// GPT-OSS 20B - Recommended for small GPU
"https://huggingface.co/unsloth/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-F16.gguf"

// DeepSeek R1 Distill - For CPU or minimal GPU
"https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf"

// Llama 3.1 8B - Good balance
"https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"

Embedding Models

For vector search, you need an embedding model:

embedding-provider,local
embedding-server-url,https://localhost:8082
embedding-model,bge-small-en-v1.5

Recommended embedding models:

Model	Dimensions	Size	Quality
bge-small-en-v1.5	384	130MB	Good
bge-base-en-v1.5	768	440MB	Better
bge-large-en-v1.5	1024	1.3GB	Best
nomic-embed-text	768	550MB	Good

Hybrid Configuration

Use different models for different tasks:

# Primary model for complex conversations
llm-provider,anthropic
llm-model,claude-3-5-sonnet-20241022

# Fast model for simple tasks
llm-fast-provider,groq
llm-fast-model,llama-3.1-8b-instant

# Local fallback for offline operation
llm-fallback-provider,local
llm-fallback-model,DeepSeek-R1-Distill-Qwen-1.5B

# Embeddings always local
embedding-provider,local
embedding-model,bge-small-en-v1.5

Model Selection Guide

By Use Case

Use Case	Recommended	Why
Customer support	Claude 3.5 Sonnet	Best at following guidelines
Code generation	DeepSeek-Coder, GPT-4o	Specialized for code
Document analysis	Gemini 1.5 Pro	2M context window
Real-time chat	Groq Llama 3.1 8B	Fastest responses
Privacy-sensitive	Local DeepSeek-R1	No external data transfer
Cost-sensitive	DeepSeek-V3, Local	Lowest cost per token
Complex reasoning	Claude 3 Opus, o1	Best reasoning ability

By Budget

Budget	Recommended Setup
Free	Local models only
Low ($10-50/mo)	Groq + Local fallback
Medium ($50-200/mo)	GPT-4o-mini + Claude Haiku
High ($200+/mo)	GPT-4o + Claude Sonnet
Enterprise	Private deployment + premium APIs

Configuration Reference

Environment Variables

# Primary LLM
LLM_PROVIDER=openai
LLM_API_KEY=sk-xxx
LLM_MODEL=gpt-4o
LLM_SERVER_URL=https://api.openai.com

# Local LLM Server
LLM_LOCAL_URL=https://localhost:8081
LLM_LOCAL_MODEL=DeepSeek-R1-Distill-Qwen-1.5B

# Embedding
EMBEDDING_PROVIDER=local
EMBEDDING_URL=https://localhost:8082
EMBEDDING_MODEL=bge-small-en-v1.5

config.csv Parameters

Parameter	Description	Example
`llm-provider`	Provider name	`openai`, `anthropic`, `local`
`llm-api-key`	API key for cloud providers	`sk-xxx`
`llm-model`	Model identifier	`gpt-4o`
`llm-server-url`	API endpoint	`https://api.openai.com`
`llm-server-ctx-size`	Context window size	`128000`
`llm-temperature`	Response randomness (0-2)	`0.7`
`llm-max-tokens`	Maximum response length	`4096`
`llm-cache-enabled`	Enable semantic caching	`true`
`llm-cache-ttl`	Cache time-to-live (seconds)	`3600`

Security Considerations

Cloud Providers

API keys should be stored in environment variables or secrets manager
Consider data residency requirements (EU: Mistral, US: OpenAI)
Review provider data retention policies
Use separate keys for production/development

Local Models

All data stays on your infrastructure
No internet required after model download
Full control over model versions
Consider GPU security for sensitive deployments

Performance Optimization

Caching

Enable semantic caching to reduce API calls:

llm-cache-enabled,true
llm-cache-ttl,3600
llm-cache-similarity-threshold,0.92

Batching

For bulk operations, use batch APIs when available:

llm-batch-enabled,true
llm-batch-size,10

Context Management

Optimize context window usage with episodic memory:

episodic-memory-enabled,true
episodic-memory-threshold,4
episodic-memory-history,2
episodic-memory-auto-summarize,true

See Episodic Memory for details.

Troubleshooting

Common Issues

API Key Invalid

Verify key is correct and not expired
Check if key has required permissions
Ensure billing is active

Model Not Found

Check model name spelling
Verify model is available in your region
Some models require waitlist access

Rate Limits

Implement exponential backoff
Use caching to reduce calls
Consider upgrading API tier

Local Model Slow

Check GPU memory usage
Reduce context size
Use quantized models (Q4 instead of F16)

Logging

Enable LLM logging for debugging:

llm-log-requests,true
llm-log-responses,false
llm-log-timing,true

Next Steps

LLM Configuration - Detailed configuration guide
Semantic Caching - Cache configuration
NVIDIA GPU Setup - GPU configuration for local models

11 KiB Raw Blame History

LLM Providers

Overview

Cloud Providers

OpenAI (GPT Series)

Anthropic (Claude Series)

Groq

Google (Gemini Series)

Mistral AI

DeepSeek

Local Models

Setting Up Local LLM

Recommended Local Models

For High-End GPU (24GB+ VRAM)

For Mid-Range GPU (12-16GB VRAM)

For Small GPU or CPU (8GB VRAM or less)

Model Download URLs

Embedding Models

Hybrid Configuration

Model Selection Guide

By Use Case

By Budget

Configuration Reference

Environment Variables

config.csv Parameters

Security Considerations

Cloud Providers

Local Models

Performance Optimization

Caching

Batching

Context Management

Troubleshooting

Common Issues

Logging

Next Steps

11 KiB

Raw Blame History