docs: Update model names to latest (GPT-5, Claude 4.5, DeepSeek-R3)

- Update all model references across 14+ documentation files
- GPT-4.1 → GPT-5, GPT-5 mini
- Claude Sonnet/Opus → Claude Sonnet 4.5, Claude Opus 4.5
- DeepSeek-R1 → DeepSeek-R3
- Add Template: Attendance CRM to SUMMARY.md
- Update attendant.csv docs with multi-channel columns
- Update TASKS.md with completed model updates
This commit is contained in:
Rodrigo Rodriguez (Pragmatismo) 2025-12-05 14:54:59 -03:00
parent 80f1041263
commit e5fd4bd3fc
25 changed files with 503 additions and 84 deletions

View file

@ -29,31 +29,34 @@
--- ---
## 🔴 CRITICAL: Model Name Updates Needed ## ✅ COMPLETED: Model Name Updates
Old model names found in documentation that should be updated: Model names updated to current versions (2025-01):
| File | Current | Should Be | | File | Updated To |
|------|---------|-----------| |------|------------|
| `appendix-external-services/README.md` | `gpt-4o` | Generic or current | | `appendix-external-services/README.md` | `claude-sonnet-4.5` |
| `appendix-external-services/catalog.md` | `claude-opus-4.5` | Current Anthropic models | | `appendix-external-services/hosting-dns.md` | `GPT-5, o3`, `Claude Sonnet 4.5, Opus 4.5` |
| `appendix-external-services/hosting-dns.md` | `GPT-4, Claude 3` | Generic reference | | `appendix-external-services/llm-providers.md` | `DeepSeek-R3, Claude Sonnet 4.5` |
| `appendix-external-services/llm-providers.md` | `claude-sonnet-4.5`, `llama-4-scout` | Current models | | `chapter-04-gbui/how-to/create-first-bot.md` | `claude-sonnet-4.5` |
| `chapter-02/gbot.md` | `GPT-4 or Claude 3` | Generic reference | | `chapter-04-gbui/how-to/monitor-sessions.md` | `LLM active` (generic) |
| `chapter-02/template-llm-server.md` | `gpt-4` | Generic or current | | `chapter-04-gbui/suite-manual.md` | `Claude Sonnet 4.5`, `Claude Opus 4.5`, `Gemini Pro`, `Llama 3.3` |
| `chapter-02/template-llm-tools.md` | `gpt-4` | Generic or current | | `chapter-06-gbdialog/keyword-model-route.md` | `claude-sonnet-4.5`, `gemini-flash`, `claude-opus-4.5` |
| `chapter-02/templates.md` | `gpt-4` | Generic or current | | `chapter-06-gbdialog/basic-vs-automation-tools.md` | `GPT-5, o3`, `Claude Sonnet 4.5, Opus 4.5` |
| `chapter-04-gbui/how-to/create-first-bot.md` | `gpt-4o` | Generic or current | | `chapter-07-gbapp/architecture.md` | `GPT-5 and o3`, `Claude Sonnet 4.5 and Opus 4.5` |
| `chapter-04-gbui/how-to/monitor-sessions.md` | `gpt-4o active` | Generic reference | | `chapter-08-config/llm-config.md` | `claude-sonnet-4.5` |
| `chapter-04-gbui/suite-manual.md` | `GPT-4o`, `Claude 3.5` | Current versions | | `chapter-08-config/secrets-management.md` | `claude-sonnet-4.5` |
| `chapter-06-gbdialog/keyword-model-route.md` | `gpt-3.5-turbo`, `gpt-4o` | Generic or current | | `chapter-11-features/ai-llm.md` | `GPT-5`, `o3`, `Claude Sonnet 4.5` |
| `chapter-06-gbdialog/keyword-use-model.md` | `gpt-4`, `codellama-7b` | Generic or current | | `chapter-11-features/core-features.md` | `GPT-5, o3`, `Claude Sonnet 4.5, Opus 4.5` |
| `executive-vision.md` | `GPT-5, o3`, `Claude Sonnet 4.5, Opus 4.5` |
### Recommendation ### Naming Convention Applied
Replace with: - OpenAI: `GPT-5`, `GPT-5 mini`, `o3`
- Generic: `your-model-name`, `{model}`, `local-model.gguf` - Anthropic: `Claude Sonnet 4.5`, `Claude Opus 4.5`
- Current local: `DeepSeek-R1-Distill-Qwen-1.5B`, `Qwen2.5-7B` - Google: `Gemini Pro`, `Gemini Flash`
- Current cloud: Provider-agnostic examples - Meta: `Llama 3.3`
- DeepSeek: `DeepSeek-V3`, `DeepSeek-R3`
- Local: `model.gguf`, `local-model`
--- ---

View file

@ -34,6 +34,7 @@
- [Template: Reminders](./chapter-02/template-reminder.md) - [Template: Reminders](./chapter-02/template-reminder.md)
- [Template: Sales CRM](./chapter-02/template-crm.md) - [Template: Sales CRM](./chapter-02/template-crm.md)
- [Template: CRM Contacts](./chapter-02/template-crm-contacts.md) - [Template: CRM Contacts](./chapter-02/template-crm-contacts.md)
- [Template: Attendance CRM](./chapter-02/template-attendance-crm.md)
- [Template: Marketing](./chapter-02/template-marketing.md) - [Template: Marketing](./chapter-02/template-marketing.md)
- [Template: Creating Templates](./chapter-02/template-template.md) - [Template: Creating Templates](./chapter-02/template-template.md)

View file

@ -50,7 +50,7 @@ Add these to your `config.csv`:
key,value key,value
llm-provider,openai llm-provider,openai
llm-api-key,YOUR_API_KEY llm-api-key,YOUR_API_KEY
llm-model,gpt-4o llm-model,claude-sonnet-4.5
weather-api-key,YOUR_OPENWEATHERMAP_KEY weather-api-key,YOUR_OPENWEATHERMAP_KEY
whatsapp-api-key,YOUR_WHATSAPP_KEY whatsapp-api-key,YOUR_WHATSAPP_KEY
whatsapp-phone-number-id,YOUR_PHONE_ID whatsapp-phone-number-id,YOUR_PHONE_ID

View file

@ -78,7 +78,7 @@ This catalog provides detailed information about every external service that Gen
| **API Key Config** | `llm-api-key` (stored in Vault) | | **API Key Config** | `llm-api-key` (stored in Vault) |
| **Documentation** | [platform.deepseek.com/docs](https://platform.deepseek.com/docs) | | **Documentation** | [platform.deepseek.com/docs](https://platform.deepseek.com/docs) |
| **BASIC Keywords** | `LLM` | | **BASIC Keywords** | `LLM` |
| **Supported Models** | `deepseek-v3.1`, `deepseek-r1` | | **Supported Models** | `deepseek-v3.1`, `deepseek-r3` |
### Mistral AI ### Mistral AI

View file

@ -195,10 +195,10 @@ vault kv put secret/botserver/smtp password="your-api-key"
| Provider | Models | Config Key | | Provider | Models | Config Key |
|----------|--------|------------| |----------|--------|------------|
| OpenAI | GPT-4, GPT-3.5 | `llm-url=https://api.openai.com/v1` | | OpenAI | GPT-5, o3 | `llm-url=https://api.openai.com/v1` |
| Anthropic | Claude 3 | `llm-url=https://api.anthropic.com` | | Anthropic | Claude Sonnet 4.5, Opus 4.5 | `llm-url=https://api.anthropic.com` |
| Groq | Llama, Mixtral | `llm-url=https://api.groq.com/openai/v1` | | Groq | Llama 3.3, Mixtral | `llm-url=https://api.groq.com/openai/v1` |
| DeepSeek | DeepSeek-V2 | `llm-url=https://api.deepseek.com` | | DeepSeek | DeepSeek-V3, R3 | `llm-url=https://api.deepseek.com` |
| Local | Any GGUF | `llm-url=http://localhost:8081` | | Local | Any GGUF | `llm-url=http://localhost:8081` |
### Local LLM Setup ### Local LLM Setup

View file

@ -47,15 +47,15 @@ Known for safety, helpfulness, and extended thinking capabilities.
| Model | Context | Best For | Speed | | Model | Context | Best For | Speed |
|-------|---------|----------|-------| |-------|---------|----------|-------|
| Claude Opus | 200K | Most capable, complex reasoning | Slow | | Claude Opus 4.5 | 200K | Most capable, complex reasoning | Slow |
| Claude Sonnet | 200K | Best balance of capability/speed | Fast | | Claude Sonnet 4.5 | 200K | Best balance of capability/speed | Fast |
**Configuration (config.csv):** **Configuration (config.csv):**
```csv ```csv
name,value name,value
llm-provider,anthropic llm-provider,anthropic
llm-model,claude-sonnet llm-model,claude-sonnet-4.5
``` ```
**Strengths:** **Strengths:**
@ -182,14 +182,14 @@ Known for efficient, capable models with exceptional reasoning.
| Model | Context | Best For | Speed | | Model | Context | Best For | Speed |
|-------|---------|----------|-------| |-------|---------|----------|-------|
| DeepSeek-V3.1 | 128K | General purpose, optimized cost | Fast | | DeepSeek-V3.1 | 128K | General purpose, optimized cost | Fast |
| DeepSeek-R1 | 128K | Reasoning, math, science | Medium | | DeepSeek-R3 | 128K | Reasoning, math, science | Medium |
**Configuration (config.csv):** **Configuration (config.csv):**
```csv ```csv
name,value name,value
llm-provider,deepseek llm-provider,deepseek
llm-model,deepseek-r1 llm-model,deepseek-r3
llm-server-url,https://api.deepseek.com llm-server-url,https://api.deepseek.com
``` ```
@ -215,7 +215,7 @@ General Bots uses **llama.cpp** server for local inference:
name,value name,value
llm-provider,local llm-provider,local
llm-server-url,http://localhost:8081 llm-server-url,http://localhost:8081
llm-model,DeepSeek-R1-Distill-Qwen-1.5B llm-model,DeepSeek-R3-Distill-Qwen-1.5B
``` ```
### Recommended Local Models ### Recommended Local Models
@ -226,7 +226,7 @@ llm-model,DeepSeek-R1-Distill-Qwen-1.5B
|-------|------|------|---------| |-------|------|------|---------|
| Llama 4 Scout 17B Q8 | 18GB | 24GB | Excellent | | Llama 4 Scout 17B Q8 | 18GB | 24GB | Excellent |
| Qwen3 72B Q4 | 42GB | 48GB+ | Excellent | | Qwen3 72B Q4 | 42GB | 48GB+ | Excellent |
| DeepSeek-R1 32B Q4 | 20GB | 24GB | Very Good | | DeepSeek-R3 32B Q4 | 20GB | 24GB | Very Good |
#### For Mid-Range GPU (12-16GB VRAM) #### For Mid-Range GPU (12-16GB VRAM)
@ -234,14 +234,14 @@ llm-model,DeepSeek-R1-Distill-Qwen-1.5B
|-------|------|------|---------| |-------|------|------|---------|
| Qwen3 14B Q8 | 15GB | 16GB | Very Good | | Qwen3 14B Q8 | 15GB | 16GB | Very Good |
| GPT-oss 20B Q4 | 12GB | 16GB | Very Good | | GPT-oss 20B Q4 | 12GB | 16GB | Very Good |
| DeepSeek-R1-Distill 14B Q4 | 8GB | 12GB | Good | | DeepSeek-R3-Distill 14B Q4 | 8GB | 12GB | Good |
| Gemma 3 27B Q4 | 16GB | 16GB | Good | | Gemma 3 27B Q4 | 16GB | 16GB | Good |
#### For Small GPU or CPU (8GB VRAM or less) #### For Small GPU or CPU (8GB VRAM or less)
| Model | Size | VRAM | Quality | | Model | Size | VRAM | Quality |
|-------|------|------|---------| |-------|------|------|---------|
| DeepSeek-R1-Distill 1.5B Q4 | 1GB | 4GB | Basic | | DeepSeek-R3-Distill 1.5B Q4 | 1GB | 4GB | Basic |
| Gemma 2 9B Q4 | 5GB | 8GB | Acceptable | | Gemma 2 9B Q4 | 5GB | 8GB | Acceptable |
| Gemma 3 27B Q2 | 10GB | 8GB | Acceptable | | Gemma 3 27B Q2 | 10GB | 8GB | Acceptable |
@ -254,7 +254,7 @@ Add models to `installer.rs` data_download_list:
"https://huggingface.co/Qwen/Qwen3-14B-GGUF/resolve/main/qwen3-14b-q4_k_m.gguf" "https://huggingface.co/Qwen/Qwen3-14B-GGUF/resolve/main/qwen3-14b-q4_k_m.gguf"
// DeepSeek R1 Distill - For CPU or minimal GPU // DeepSeek R1 Distill - For CPU or minimal GPU
"https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf" "https://huggingface.co/unsloth/DeepSeek-R3-Distill-Qwen-1.5B-GGUF/resolve/main/DeepSeek-R3-Distill-Qwen-1.5B-Q4_K_M.gguf"
// GPT-oss 20B - Good balance for agents // GPT-oss 20B - Good balance for agents
"https://huggingface.co/openai/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-q4_k_m.gguf" "https://huggingface.co/openai/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-q4_k_m.gguf"
@ -290,11 +290,11 @@ Use different models for different tasks:
```csv ```csv
name,value name,value
llm-provider,anthropic llm-provider,anthropic
llm-model,claude-sonnet llm-model,claude-sonnet-4.5
llm-fast-provider,groq llm-fast-provider,groq
llm-fast-model,llama-3.3-70b llm-fast-model,llama-3.3-70b
llm-fallback-provider,local llm-fallback-provider,local
llm-fallback-model,DeepSeek-R1-Distill-Qwen-1.5B llm-fallback-model,DeepSeek-R3-Distill-Qwen-1.5B
embedding-provider,local embedding-provider,local
embedding-model,bge-small-en-v1.5 embedding-model,bge-small-en-v1.5
``` ```
@ -305,11 +305,11 @@ embedding-model,bge-small-en-v1.5
| Use Case | Recommended | Why | | Use Case | Recommended | Why |
|----------|-------------|-----| |----------|-------------|-----|
| Customer support | Claude Sonnet | Best at following guidelines | | Customer support | Claude Sonnet 4.5 | Best at following guidelines |
| Code generation | DeepSeek-R1, GPT-4o | Specialized for code | | Code generation | DeepSeek-R3, Claude Sonnet 4.5 | Specialized for code |
| Document analysis | Gemini Pro | 2M context window | | Document analysis | Gemini Pro | 2M context window |
| Real-time chat | Groq Llama 3.3 | Fastest responses | | Real-time chat | Groq Llama 3.3 | Fastest responses |
| Privacy-sensitive | Local DeepSeek-R1 | No external data transfer | | Privacy-sensitive | Local DeepSeek-R3 | No external data transfer |
| Cost-sensitive | DeepSeek, Local models | Lowest cost per token | | Cost-sensitive | DeepSeek, Local models | Lowest cost per token |
| Complex reasoning | Claude Opus, Gemini Pro | Best reasoning ability | | Complex reasoning | Claude Opus, Gemini Pro | Best reasoning ability |
| Real-time research | Grok | Live data access | | Real-time research | Grok | Live data access |

View file

@ -0,0 +1,414 @@
# Attendance CRM Template (attendance-crm.gbai)
A hybrid AI + Human support template that combines intelligent bot routing with human attendant management and full CRM automation. This template demonstrates the power of General Bots as an LLM orchestrator for customer service operations.
---
## Overview
The Attendance CRM template provides:
- **Intelligent Routing** - Bot analyzes sentiment and auto-transfers frustrated customers
- **LLM-Assisted Attendants** - AI tips, message polish, smart replies for human agents
- **Queue Management** - Automated queue monitoring and load balancing
- **CRM Automations** - Follow-ups, collections, lead nurturing, pipeline management
- **Multi-Channel Support** - Works on WhatsApp, Web, and other channels
## Key Features
| Feature | Description |
|---------|-------------|
| **Sentiment-Based Transfer** | Auto-transfers when customer frustration is detected |
| **AI Copilot for Attendants** | Real-time tips, smart replies, message polishing |
| **Queue Health Monitoring** | Auto-reassign stale conversations, alert supervisors |
| **Automated Follow-ups** | 1-day, 3-day, 7-day follow-up sequences |
| **Collections Workflow** | Payment reminders from due date to legal escalation |
| **Lead Scoring & Nurturing** | Score leads and re-engage cold prospects |
| **Pipeline Management** | Weekly reviews, stale opportunity alerts |
---
## Package Structure
```
attendance-crm.gbai/
├── attendance-crm.gbdialog/
│ ├── start.bas # Main entry - intelligent routing
│ ├── queue-monitor.bas # Queue health monitoring (scheduled)
│ ├── attendant-helper.bas # LLM assist tools for attendants
│ └── crm-automations.bas # Follow-ups, collections, nurturing
├── attendance-crm.gbot/
│ └── config.csv # Bot configuration
└── attendant.csv # Attendant team configuration
```
---
## Configuration
### config.csv
```csv
name,value
# Bot Identity
bot-name,Attendance CRM Bot
bot-description,Hybrid AI + Human support with CRM integration
# CRM / Human Handoff - Required
crm-enabled,true
# LLM Assist Features for Attendants
attendant-llm-tips,true
attendant-polish-message,true
attendant-smart-replies,true
attendant-auto-summary,true
attendant-sentiment-analysis,true
# Bot Personality (used for LLM assist context)
bot-system-prompt,You are a professional customer service assistant. Be helpful and empathetic.
# Auto-transfer triggers
auto-transfer-on-frustration,true
auto-transfer-threshold,3
# Queue Settings
queue-timeout-minutes,30
queue-notify-interval,5
# Lead Scoring
lead-score-threshold-hot,70
lead-score-threshold-warm,50
# Follow-up Automation
follow-up-1-day,true
follow-up-3-day,true
follow-up-7-day,true
# Collections Automation
collections-enabled,true
collections-grace-days,3
# Working Hours
business-hours-start,09:00
business-hours-end,18:00
business-days,1-5
# Notifications
notify-on-vip,true
notify-on-escalation,true
notify-email,support@company.com
```
### attendant.csv
Attendants can be identified by **any channel**: WhatsApp phone, email, Microsoft Teams, or Google account.
```csv
id,name,channel,preferences,department,aliases,phone,email,teams,google
att-001,João Silva,all,sales,commercial,joao;js;silva,+5511999990001,joao.silva@company.com,joao.silva@company.onmicrosoft.com,joao.silva@company.com
att-002,Maria Santos,whatsapp,support,customer-service,maria;ms,+5511999990002,maria.santos@company.com,maria.santos@company.onmicrosoft.com,maria.santos@gmail.com
att-003,Pedro Costa,web,technical,engineering,pedro;pc;tech,+5511999990003,pedro.costa@company.com,pedro.costa@company.onmicrosoft.com,pedro.costa@company.com
att-004,Ana Oliveira,all,collections,finance,ana;ao;cobranca,+5511999990004,ana.oliveira@company.com,ana.oliveira@company.onmicrosoft.com,ana.oliveira@company.com
att-005,Carlos Souza,whatsapp,sales,commercial,carlos;cs,+5511999990005,carlos.souza@company.com,carlos.souza@company.onmicrosoft.com,carlos.souza@gmail.com
```
#### Column Reference
| Column | Description | Example |
|--------|-------------|---------|
| `id` | Unique attendant ID | `att-001` |
| `name` | Display name | `João Silva` |
| `channel` | Preferred channels (`all`, `whatsapp`, `web`, `teams`) | `all` |
| `preferences` | Specialization area | `sales`, `support`, `technical` |
| `department` | Department for routing | `commercial`, `engineering` |
| `aliases` | Semicolon-separated nicknames for matching | `joao;js;silva` |
| `phone` | WhatsApp number (E.164 format) | `+5511999990001` |
| `email` | Email address for notifications | `joao@company.com` |
| `teams` | Microsoft Teams UPN | `joao@company.onmicrosoft.com` |
| `google` | Google Workspace email | `joao@company.com` |
The system can find an attendant by **any identifier** - phone, email, Teams UPN, Google account, name, or alias.
```
---
## Scripts
### start.bas - Intelligent Routing
The main entry point analyzes every customer message and decides routing:
```basic
' Analyze sentiment immediately
sentiment = ANALYZE SENTIMENT session.id, message
' Track frustration
IF sentiment.overall = "negative" THEN
frustration_count = frustration_count + 1
END IF
' Auto-transfer on high escalation risk
IF sentiment.escalation_risk = "high" THEN
tips = GET TIPS session.id, message
result = TRANSFER TO HUMAN "support", "urgent", context_summary
END IF
```
**Key behaviors:**
- Analyzes sentiment on every message
- Tracks frustration count across conversation
- Auto-transfers on explicit request ("falar com humano", "talk to human")
- Auto-transfers when escalation risk is high
- Auto-transfers after 3+ negative messages
- Passes AI tips to attendant during transfer
### queue-monitor.bas - Queue Health
Scheduled job that runs every 5 minutes:
```basic
SET SCHEDULE "queue-monitor", "*/5 * * * *"
```
**What it does:**
- Finds conversations waiting >10 minutes → auto-assigns
- Finds inactive assigned conversations → reminds attendant
- Finds conversations with offline attendants → reassigns
- Detects abandoned conversations → sends follow-up, then resolves
- Generates queue metrics for dashboard
- Alerts supervisor if queue gets long or no attendants online
### attendant-helper.bas - LLM Assist Tools
Provides AI-powered assistance to human attendants:
```basic
' Get tips for current conversation
tips = USE TOOL "attendant-helper", "tips", session_id, message
' Polish a message before sending
polished = USE TOOL "attendant-helper", "polish", session_id, message, "empathetic"
' Get smart reply suggestions
replies = USE TOOL "attendant-helper", "replies", session_id
' Get conversation summary
summary = USE TOOL "attendant-helper", "summary", session_id
' Analyze sentiment with recommendations
sentiment = USE TOOL "attendant-helper", "sentiment", session_id, message
' Check if transfer is recommended
should_transfer = USE TOOL "attendant-helper", "suggest_transfer", session_id
```
### crm-automations.bas - Business Workflows
Scheduled CRM automations:
```basic
' Daily follow-ups at 9am weekdays
SET SCHEDULE "follow-ups", "0 9 * * 1-5"
' Daily collections at 8am weekdays
SET SCHEDULE "collections", "0 8 * * 1-5"
' Daily lead nurturing at 10am weekdays
SET SCHEDULE "lead-nurture", "0 10 * * 1-5"
' Weekly pipeline review Friday 2pm
SET SCHEDULE "pipeline-review", "0 14 * * 5"
```
---
## BASIC Keywords Used
### Queue Management
| Keyword | Description | Example |
|---------|-------------|---------|
| `GET QUEUE` | Get queue status and items | `queue = GET QUEUE` |
| `NEXT IN QUEUE` | Get next waiting conversation | `next = NEXT IN QUEUE` |
| `ASSIGN CONVERSATION` | Assign to attendant | `ASSIGN CONVERSATION session_id, "att-001"` |
| `RESOLVE CONVERSATION` | Mark as resolved | `RESOLVE CONVERSATION session_id, "Fixed"` |
| `SET PRIORITY` | Change priority | `SET PRIORITY session_id, "urgent"` |
### Attendant Management
| Keyword | Description | Example |
|---------|-------------|---------|
| `GET ATTENDANTS` | List attendants | `attendants = GET ATTENDANTS "online"` |
| `GET ATTENDANT STATS` | Get performance metrics | `stats = GET ATTENDANT STATS "att-001"` |
| `SET ATTENDANT STATUS` | Change status | `SET ATTENDANT STATUS "att-001", "busy"` |
### LLM Assist
| Keyword | Description | Example |
|---------|-------------|---------|
| `GET TIPS` | Generate AI tips | `tips = GET TIPS session_id, message` |
| `POLISH MESSAGE` | Improve message | `polished = POLISH MESSAGE text, "empathetic"` |
| `GET SMART REPLIES` | Get reply suggestions | `replies = GET SMART REPLIES session_id` |
| `GET SUMMARY` | Conversation summary | `summary = GET SUMMARY session_id` |
| `ANALYZE SENTIMENT` | Sentiment analysis | `sentiment = ANALYZE SENTIMENT session_id, message` |
### Customer Journey
| Keyword | Description | Example |
|---------|-------------|---------|
| `TAG CONVERSATION` | Add tags | `TAG CONVERSATION session_id, "vip"` |
| `ADD NOTE` | Add internal note | `ADD NOTE session_id, "Customer is VIP"` |
| `GET CUSTOMER HISTORY` | Previous interactions | `history = GET CUSTOMER HISTORY user_id` |
| `TRANSFER TO HUMAN` | Handoff to human | `TRANSFER TO HUMAN "support", "high", context` |
---
## Sample Conversations
### Auto-Transfer on Frustration
```
Customer: Olá, preciso de ajuda com meu pedido
Bot: Olá! Claro, ficarei feliz em ajudar com seu pedido.
Qual é o número do pedido?
Customer: #12345
Bot: Encontrei seu pedido #12345. Está em trânsito e deve
chegar amanhã. Posso ajudar com mais alguma coisa?
Customer: Amanhã?! Isso é ridículo! Já deveria ter chegado há 3 dias!
Bot: [Sentiment: negative, Escalation Risk: medium]
Entendo sua frustração. Peço desculpas pelo atraso.
Se preferir falar com um atendente humano, é só me avisar.
Customer: SIM! Quero falar com alguém de verdade AGORA!
Bot: [Auto-transfer triggered]
Estou transferindo você para Maria Santos.
Um momento, por favor.
[Tips sent to Maria:]
⚠️ Customer frustrated - 3 negative messages
🎯 Issue: Delayed order #12345
✅ Offer compensation for delay
```
### Attendant Using LLM Assist
```
[Customer message arrives]
Customer: não consigo acessar minha conta faz 2 dias!!
[AI Tips appear in attendant UI:]
💡 Tips:
⚠️ Customer frustrated - use empathetic tone
🎯 Intent: Account access issue
✅ Verify account status, offer password reset
[Attendant types response:]
Attendant: oi, vou verificar sua conta
[Clicks ✨ Polish button:]
Polished: "Olá! Entendo como isso pode ser frustrante.
Vou verificar sua conta agora mesmo e resolver
isso para você."
[Attendant sends polished message]
```
---
## Automation Workflows
### Follow-up Sequence
| Day | Action | Template |
|-----|--------|----------|
| 1 | Thank you message | `follow_up_thanks` |
| 3 | Value proposition | `follow_up_value` |
| 7 | Special offer (if score ≥50) | `follow_up_offer` |
### Collections Workflow
| Days Overdue | Action | Escalation |
|--------------|--------|------------|
| 0 (due today) | Friendly reminder | WhatsApp template |
| 3 | First notice | WhatsApp + Email |
| 7 | Second notice | + Notify collections team |
| 15 | Final notice + late fees | + Queue for human call |
| 30+ | Send to legal | + Suspend account |
---
## WhatsApp Templates Required
Configure these in Meta Business Manager:
| Template | Variables | Purpose |
|----------|-----------|---------|
| `follow_up_thanks` | name, interest | 1-day thank you |
| `follow_up_value` | name, interest | 3-day value prop |
| `follow_up_offer` | name, discount | 7-day offer |
| `payment_due_today` | name, invoice_id, amount | Due reminder |
| `payment_overdue_3` | name, invoice_id, amount | 3-day overdue |
| `payment_overdue_7` | name, invoice_id, amount | 7-day overdue |
| `payment_final_notice` | name, invoice_id, total | 15-day final |
---
## Metrics & Analytics
The template automatically tracks:
- **Queue Metrics**: Wait times, queue length, utilization
- **Attendant Performance**: Resolved count, active conversations
- **Sentiment Trends**: Per conversation and overall
- **Automation Results**: Follow-ups sent, collections processed
Access via:
- Dashboard at `/suite/analytics/`
- API at `/api/attendance/insights`
- Stored in `queue_metrics` and `automation_logs` tables
---
## Best Practices
### 1. Configure Sentiment Thresholds
Adjust `auto-transfer-threshold` based on your tolerance:
- `2` = Very aggressive (transfer quickly)
- `3` = Balanced (default)
- `5` = Conservative (try harder with bot)
### 2. Set Business Hours
Configure `business-hours-*` to avoid sending automated messages at night.
### 3. Train Your Team
Ensure attendants know the WhatsApp commands:
- `/tips` - Get AI tips
- `/polish <message>` - Improve message
- `/replies` - Get suggestions
- `/resolve` - Close conversation
### 4. Monitor Queue Health
Set up alerts for:
- Queue > 10 waiting
- No attendants online during business hours
- Average wait > 15 minutes
---
## See Also
- [Transfer to Human](../chapter-11-features/transfer-to-human.md) - Handoff details
- [LLM-Assisted Attendant](../chapter-11-features/attendant-llm-assist.md) - AI copilot features
- [Sales CRM Template](./template-crm.md) - Full CRM without attendance
- [Attendance Queue Module](../appendix-external-services/attendance-queue.md) - Queue configuration

View file

@ -275,8 +275,8 @@ If you have API keys for AI services, configure them:
| Setting | Description | Example Value | | Setting | Description | Example Value |
|---------|-------------|---------------| |---------|-------------|---------------|
| **LLM Provider** | AI service to use | `openai` | | **LLM Provider** | AI service to use | `anthropic` |
| **Model** | Specific model | `gpt-4o` | | **Model** | Specific model | `claude-sonnet-4.5` |
| **API Key** | Your API key | `sk-...` | | **API Key** | Your API key | `sk-...` |
⚠️ **Warning**: Keep your API keys secret. Never share them. ⚠️ **Warning**: Keep your API keys secret. Never share them.

View file

@ -211,7 +211,7 @@ The dashboard shows the health of all components:
│ ● PostgreSQL Running v16.2 24/100 connections │ │ ● PostgreSQL Running v16.2 24/100 connections │
│ ● Qdrant Running v1.9.2 1.2M vectors │ │ ● Qdrant Running v1.9.2 1.2M vectors │
│ ● MinIO Running v2024.01 45.2 GB stored │ │ ● MinIO Running v2024.01 45.2 GB stored │
│ ● BotModels Running v2.1.0 gpt-4o active │ ● BotModels Running v2.1.0 LLM active
│ ● Vault Sealed v1.15.0 156 secrets │ │ ● Vault Sealed v1.15.0 156 secrets │
│ ● Cache Running v7.2.4 94.2% hit rate │ │ ● Cache Running v7.2.4 94.2% hit rate │
│ ● InfluxDB Running v2.7.3 2,450 pts/sec │ │ ● InfluxDB Running v2.7.3 2,450 pts/sec │

View file

@ -699,10 +699,10 @@ Sources is your library of prompts, templates, tools, and AI models. Find and us
| Model | Provider | Best For | | Model | Provider | Best For |
|-------|----------|----------| |-------|----------|----------|
| GPT-4o | OpenAI | General tasks, vision | | Claude Sonnet 4.5 | Anthropic | General tasks, coding |
| Claude 3.5 | Anthropic | Analysis, coding | | Claude Opus 4.5 | Anthropic | Complex analysis |
| Gemini 1.5 | Google | Long documents | | Gemini Pro | Google | Long documents |
| Llama 3.1 | Meta | Open source, privacy | | Llama 3.3 | Meta | Open source, privacy |
--- ---

View file

@ -99,8 +99,8 @@ This message reaches users on WhatsApp, Telegram, Web, or any configured channel
BASIC supports any LLM provider: BASIC supports any LLM provider:
- OpenAI (GPT-4, GPT-3.5) - OpenAI (GPT-5, o3)
- Anthropic (Claude) - Anthropic (Claude Sonnet 4.5, Opus 4.5)
- Local models (Llama, Mistral via llama.cpp) - Local models (Llama, Mistral via llama.cpp)
- Groq, DeepSeek, and others - Groq, DeepSeek, and others
- Any OpenAI-compatible API - Any OpenAI-compatible API

View file

@ -61,10 +61,10 @@ Add to `config.csv`:
```csv ```csv
llm-models,default;fast;quality;code llm-models,default;fast;quality;code
model-routing-strategy,auto model-routing-strategy,auto
model-default,gpt-3.5-turbo model-default,claude-sonnet-4.5
model-fast,gpt-3.5-turbo model-fast,gemini-flash
model-quality,gpt-4o model-quality,claude-opus-4.5
model-code,claude-sonnet model-code,claude-sonnet-4.5
``` ```
## Example: Task-Based Routing ## Example: Task-Based Routing

View file

@ -128,7 +128,7 @@ The system supports several routing strategies configured in `config.csv`:
name,value name,value
model-routing-strategy,auto model-routing-strategy,auto
model-default,fast model-default,fast
model-fast,DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf model-fast,DeepSeek-R3-Distill-Qwen-1.5B-Q3_K_M.gguf
model-quality,gpt-4 model-quality,gpt-4
model-code,codellama-7b.gguf model-code,codellama-7b.gguf
model-fallback-enabled,true model-fallback-enabled,true

View file

@ -56,7 +56,7 @@ The `web_server/` module implements the HTTP server and web interface. It serves
The `llm/` module provides large language model integration. It handles model selection based on configuration and requirements, formats prompts according to model expectations, manages token counting and context limits, streams responses for real-time display, tracks API costs for budgeting, and implements model fallbacks when primary providers are unavailable. The `llm/` module provides large language model integration. It handles model selection based on configuration and requirements, formats prompts according to model expectations, manages token counting and context limits, streams responses for real-time display, tracks API costs for budgeting, and implements model fallbacks when primary providers are unavailable.
The `llm_models/` module contains specific implementations for different model providers. OpenAI integration supports GPT-3.5 and GPT-4 models. Anthropic integration provides access to Claude models. Google integration enables Gemini model usage. Meta integration supports Llama models for local deployment. Local model support allows self-hosted inference. Custom model implementations can be added for specialized providers. The `llm_models/` module contains specific implementations for different model providers. OpenAI integration supports GPT-5 and o3 models. Anthropic integration provides access to Claude Sonnet 4.5 and Opus 4.5 models. Google integration enables Gemini model usage. Meta integration supports Llama models for local deployment. Local model support allows self-hosted inference. Custom model implementations can be added for specialized providers.
The `prompt_manager/` module provides centralized prompt management capabilities. It maintains prompt templates for consistent interactions, handles variable substitution in prompts, optimizes prompts for specific models, supports version control of prompt changes, enables A/B testing of different approaches, and tracks prompt performance metrics. The `prompt_manager/` module provides centralized prompt management capabilities. It maintains prompt templates for consistent interactions, handles variable substitution in prompts, optimizes prompts for specific models, supports version control of prompt changes, enables A/B testing of different approaches, and tracks prompt performance metrics.

View file

@ -373,7 +373,7 @@ Failover happens automatically within seconds, with clients redirected via the c
# config.csv - Fallbacks # config.csv - Fallbacks
fallback-llm-enabled,true fallback-llm-enabled,true
fallback-llm-provider,local fallback-llm-provider,local
fallback-llm-model,DeepSeek-R1-Distill-Qwen-1.5B fallback-llm-model,DeepSeek-R3-Distill-Qwen-1.5B
fallback-cache-enabled,true fallback-cache-enabled,true
fallback-cache-mode,memory fallback-cache-mode,memory

View file

@ -584,7 +584,7 @@ Configure fallback behavior:
# Fallback configuration # Fallback configuration
fallback-llm-enabled,true fallback-llm-enabled,true
fallback-llm-provider,local fallback-llm-provider,local
fallback-llm-model,DeepSeek-R1-Distill-Qwen-1.5B fallback-llm-model,DeepSeek-R3-Distill-Qwen-1.5B
fallback-cache-enabled,true fallback-cache-enabled,true
fallback-cache-mode,memory fallback-cache-mode,memory

View file

@ -56,7 +56,7 @@ A complete working configuration:
name,value name,value
server-port,8080 server-port,8080
llm-url,http://localhost:8081 llm-url,http://localhost:8081
llm-model,../../../../data/llm/DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf llm-model,../../../../data/llm/DeepSeek-R3-Distill-Qwen-1.5B-Q3_K_M.gguf
episodic-memory-threshold,4 episodic-memory-threshold,4
``` ```

View file

@ -41,7 +41,7 @@ For detailed LLM configuration, see the tables below. The basic settings are:
```csv ```csv
llm-key,none llm-key,none
llm-url,http://localhost:8081 llm-url,http://localhost:8081
llm-model,../../../../data/llm/DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf llm-model,../../../../data/llm/DeepSeek-R3-Distill-Qwen-1.5B-Q3_K_M.gguf
``` ```
#### Core LLM Settings #### Core LLM Settings
@ -223,7 +223,7 @@ llm-server,true
llm-server-gpu-layers,35 llm-server-gpu-layers,35
llm-server-ctx-size,8192 llm-server-ctx-size,8192
llm-server-n-predict,2048 llm-server-n-predict,2048
llm-model,../../../../data/llm/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf llm-model,../../../../data/llm/DeepSeek-R3-Distill-Qwen-7B-Q4_K_M.gguf
, ,
# Disable cache for development # Disable cache for development
llm-cache,false llm-cache,false
@ -296,15 +296,15 @@ Others require restart:
**24GB+ VRAM (RTX 3090, 4090)** **24GB+ VRAM (RTX 3090, 4090)**
- DeepSeek-V3 (with MoE enabled) - DeepSeek-V3 (with MoE enabled)
- Qwen2.5-32B-Instruct-Q4_K_M - Qwen2.5-32B-Instruct-Q4_K_M
- DeepSeek-R1-Distill-Qwen-14B (runs fast with room to spare) - DeepSeek-R3-Distill-Qwen-14B (runs fast with room to spare)
**12-16GB VRAM (RTX 4070, 4070Ti)** **12-16GB VRAM (RTX 4070, 4070Ti)**
- DeepSeek-R1-Distill-Llama-8B - DeepSeek-R3-Distill-Llama-8B
- Qwen2.5-14B-Q4_K_M - Qwen2.5-14B-Q4_K_M
- Mistral-7B-Instruct-Q5_K_M - Mistral-7B-Instruct-Q5_K_M
**8GB VRAM or CPU-Only** **8GB VRAM or CPU-Only**
- DeepSeek-R1-Distill-Qwen-1.5B - DeepSeek-R3-Distill-Qwen-1.5B
- Phi-3-mini-4k-instruct - Phi-3-mini-4k-instruct
- Qwen2.5-3B-Instruct-Q5_K_M - Qwen2.5-3B-Instruct-Q5_K_M

View file

@ -9,7 +9,7 @@ BotServer is designed to work with local GGUF models by default. The minimal con
```csv ```csv
llm-key,none llm-key,none
llm-url,http://localhost:8081 llm-url,http://localhost:8081
llm-model,../../../../data/llm/DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf llm-model,../../../../data/llm/DeepSeek-R3-Distill-Qwen-1.5B-Q3_K_M.gguf
``` ```
### Model Path ### Model Path
@ -156,7 +156,7 @@ Using a cloud provider for inference:
name,value name,value
llm-key,sk-... llm-key,sk-...
llm-url,https://api.anthropic.com llm-url,https://api.anthropic.com
llm-model,claude-3 llm-model,claude-sonnet-4.5
llm-cache,true llm-cache,true
llm-cache-ttl,7200 llm-cache-ttl,7200
``` ```
@ -179,7 +179,7 @@ Supporting concurrent users requires enabling `llm-server-cont-batching` and inc
### Small Models (1-3B parameters) ### Small Models (1-3B parameters)
Small models like DeepSeek-R1-Distill-Qwen-1.5B deliver fast responses with low memory usage. They work well for simple tasks, quick interactions, and resource-constrained environments. Small models like DeepSeek-R3-Distill-Qwen-1.5B deliver fast responses with low memory usage. They work well for simple tasks, quick interactions, and resource-constrained environments.
### Medium Models (7-13B parameters) ### Medium Models (7-13B parameters)

View file

@ -55,7 +55,7 @@ Complete reference of all available parameters in `config.csv`.
#### For RTX 3090 (24GB VRAM) #### For RTX 3090 (24GB VRAM)
You can run impressive models with proper configuration: You can run impressive models with proper configuration:
- **DeepSeek-R1-Distill-Qwen-7B**: Set `llm-server-gpu-layers` to 35-40 - **DeepSeek-R3-Distill-Qwen-7B**: Set `llm-server-gpu-layers` to 35-40
- **Qwen2.5-32B-Instruct (Q4_K_M)**: Fits with `llm-server-gpu-layers` to 40-45 - **Qwen2.5-32B-Instruct (Q4_K_M)**: Fits with `llm-server-gpu-layers` to 40-45
- **DeepSeek-V3 (with MoE)**: Set `llm-server-n-moe` to 2-4 to run even 120B models! MoE only loads active experts - **DeepSeek-V3 (with MoE)**: Set `llm-server-n-moe` to 2-4 to run even 120B models! MoE only loads active experts
- **Optimization**: Use `llm-server-ctx-size` of 8192 for longer contexts - **Optimization**: Use `llm-server-ctx-size` of 8192 for longer contexts
@ -63,20 +63,20 @@ You can run impressive models with proper configuration:
#### For RTX 4070/4070Ti (12-16GB VRAM) #### For RTX 4070/4070Ti (12-16GB VRAM)
Mid-range cards work great with quantized models: Mid-range cards work great with quantized models:
- **Qwen2.5-14B (Q4_K_M)**: Set `llm-server-gpu-layers` to 25-30 - **Qwen2.5-14B (Q4_K_M)**: Set `llm-server-gpu-layers` to 25-30
- **DeepSeek-R1-Distill-Llama-8B**: Fully fits with layers at 32 - **DeepSeek-R3-Distill-Llama-8B**: Fully fits with layers at 32
- **Tips**: Keep `llm-server-ctx-size` at 4096 to save VRAM - **Tips**: Keep `llm-server-ctx-size` at 4096 to save VRAM
#### For CPU-Only (No GPU) #### For CPU-Only (No GPU)
Modern CPUs can still run capable models: Modern CPUs can still run capable models:
- **DeepSeek-R1-Distill-Qwen-1.5B**: Fast on CPU, great for testing - **DeepSeek-R3-Distill-Qwen-1.5B**: Fast on CPU, great for testing
- **Phi-3-mini (3.8B)**: Excellent CPU performance - **Phi-3-mini (3.8B)**: Excellent CPU performance
- **Settings**: Set `llm-server-mlock` to `true` to prevent swapping - **Settings**: Set `llm-server-mlock` to `true` to prevent swapping
- **Parallel**: Increase `llm-server-parallel` to CPU cores -2 - **Parallel**: Increase `llm-server-parallel` to CPU cores -2
#### Recommended Models (GGUF Format) #### Recommended Models (GGUF Format)
- **Best Overall**: DeepSeek-R1-Distill series (1.5B to 70B) - **Best Overall**: DeepSeek-R3-Distill series (1.5B to 70B)
- **Best Small**: Qwen2.5-3B-Instruct-Q5_K_M - **Best Small**: Qwen2.5-3B-Instruct-Q5_K_M
- **Best Medium**: DeepSeek-R1-Distill-Qwen-14B-Q4_K_M - **Best Medium**: DeepSeek-R3-Distill-Qwen-14B-Q4_K_M
- **Best Large**: DeepSeek-V3, Qwen2.5-32B, or GPT2-120B-GGUF (with MoE enabled) - **Best Large**: DeepSeek-V3, Qwen2.5-32B, or GPT2-120B-GGUF (with MoE enabled)
**Pro Tip**: The `llm-server-n-moe` parameter is magic for large models - it enables Mixture of Experts, letting you run 120B+ models on consumer hardware by only loading the experts needed for each token! **Pro Tip**: The `llm-server-n-moe` parameter is magic for large models - it enables Mixture of Experts, letting you run 120B+ models on consumer hardware by only loading the experts needed for each token!

View file

@ -88,8 +88,8 @@ The bot's `config.csv` contains **non-sensitive** configuration:
```csv ```csv
# Bot behavior - NOT secrets # Bot behavior - NOT secrets
llm-provider,openai llm-provider,anthropic
llm-model,gpt-4o llm-model,claude-sonnet-4.5
llm-temperature,0.7 llm-temperature,0.7
llm-max-tokens,4096 llm-max-tokens,4096
@ -369,8 +369,8 @@ Reference Vault secrets in your bot's config.csv:
```csv ```csv
# Direct value (non-sensitive) # Direct value (non-sensitive)
llm-provider,openai llm-provider,anthropic
llm-model,gpt-4o llm-model,claude-sonnet-4.5
llm-temperature,0.7 llm-temperature,0.7
# Vault reference (sensitive) # Vault reference (sensitive)

View file

@ -12,7 +12,7 @@ The LLM integration in BotServer enables sophisticated conversational experience
### OpenAI ### OpenAI
OpenAI serves as the primary LLM provider with support for multiple model tiers. GPT-3.5 Turbo provides fast, cost-effective responses for straightforward conversations. GPT-4 delivers more nuanced understanding for complex queries. GPT-4 Turbo offers an optimal balance of capability and speed. Custom fine-tuned models can be used when you have specialized requirements. OpenAI serves as the primary LLM provider with support for multiple model tiers. GPT-5 provides fast, cost-effective responses for straightforward conversations. GPT-5 mini delivers efficient processing for simpler queries. The o3 series offers superior reasoning for complex tasks. Custom fine-tuned models can be used when you have specialized requirements.
Configuration requires setting your API key and selecting a model: Configuration requires setting your API key and selecting a model:
@ -181,7 +181,7 @@ Choosing the right model involves balancing several factors. Capability requirem
### Model Comparison ### Model Comparison
GPT-3.5 Turbo offers the fastest responses at the lowest cost, suitable for straightforward questions. GPT-4 provides superior reasoning for complex queries at higher cost and latency. Local models like Llama variants offer privacy and cost predictability with varying capability levels. Specialized models may excel at particular domains like code or medical content. GPT-5 mini offers the fastest responses at the lowest cost, suitable for straightforward questions. Claude Sonnet 4.5 and GPT-5 provide superior reasoning for complex queries with good balance of cost and capability. Local models like Llama variants offer privacy and cost predictability with varying capability levels. Specialized models may excel at particular domains like code or medical content.
## Integration with Tools ## Integration with Tools

View file

@ -38,7 +38,8 @@ Scripts stored as `.gbdialog` files in bot packages.
| Provider | Models | Features | | Provider | Models | Features |
|----------|--------|----------| |----------|--------|----------|
| OpenAI | GPT-3.5, GPT-4 | Streaming, function calling | | OpenAI | GPT-5, o3 | Streaming, function calling |
| Anthropic | Claude Sonnet 4.5, Opus 4.5 | Analysis, coding, guidelines |
| Local | GGUF models | GPU acceleration, offline | | Local | GGUF models | GPU acceleration, offline |
Features: prompt templates, context injection, token management, cost optimization. Features: prompt templates, context injection, token management, cost optimization.

View file

@ -223,7 +223,7 @@ USE MODEL "auto"
name,value name,value
model-routing-strategy,auto model-routing-strategy,auto
model-default,fast model-default,fast
model-fast,DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf model-fast,DeepSeek-R3-Distill-Qwen-1.5B-Q3_K_M.gguf
model-quality,gpt-4 model-quality,gpt-4
model-code,codellama-7b.gguf model-code,codellama-7b.gguf
``` ```

View file

@ -232,8 +232,8 @@ botserver --start
## INTEGRATION CAPABILITIES ## INTEGRATION CAPABILITIES
### LLM Providers ### LLM Providers
- OpenAI (GPT-4, GPT-3.5) - OpenAI (GPT-5, o3)
- Anthropic (Claude) - Anthropic (Claude Sonnet 4.5, Opus 4.5)
- Meta (Llama) - Meta (Llama)
- DeepSeek - DeepSeek
- Local models via Ollama - Local models via Ollama