16 KiB
Knowledge Base (KB) and Tools System
Overview
This document describes the comprehensive Knowledge Base (KB) and BASIC Tools compilation system integrated into the botserver. This system enables:
- Dynamic Knowledge Base Management: Monitor MinIO buckets for document changes and automatically index them in Qdrant vector database
- BASIC Tool Compilation: Compile BASIC scripts into AST and generate MCP/OpenAI tool definitions
- Intelligent Context Processing: Enhance prompts with relevant KB documents and available tools based on answer mode
- Temporary Website Indexing: Crawl and index web pages for session-specific knowledge
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Bot Server │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ KB Manager │ │ MinIO │ │ Qdrant │ │
│ │ │◄──►│ Handler │◄──►│ Client │ │
│ │ │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ ▲ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ BASIC │ │ Embeddings │ │
│ │ Compiler │ │ Generator │ │
│ │ │ │ │ │
│ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Prompt Processor │ │
│ │ (Integrates KB + Tools based on Answer Mode) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Components
1. KB Manager (src/kb/mod.rs)
The KB Manager coordinates MinIO monitoring and Qdrant indexing:
- Watches collections: Monitors
.gbkb/folders for document changes - Detects changes: Uses file hashing (SHA256) to detect modified files
- Indexes documents: Splits documents into chunks and generates embeddings
- Stores metadata: Maintains document information in PostgreSQL
Key Functions
// Add a KB collection to be monitored
kb_manager.add_collection(bot_id, "enrollpdfs").await?;
// Remove a collection
kb_manager.remove_collection("enrollpdfs").await?;
// Start the monitoring service
let kb_handle = kb_manager.spawn();
2. MinIO Handler (src/kb/minio_handler.rs)
Monitors MinIO buckets for file changes:
- Polling: Checks for changes every 15 seconds
- Event detection: Identifies created, modified, and deleted files
- State tracking: Maintains file ETags and sizes for change detection
File Change Events
pub enum FileChangeEvent {
Created { path: String, size: i64, etag: String },
Modified { path: String, size: i64, etag: String },
Deleted { path: String },
}
3. Qdrant Client (src/kb/qdrant_client.rs)
Manages vector database operations:
- Collection management: Create, delete, and check collections
- Point operations: Upsert and delete vector points
- Search: Semantic search using cosine similarity
Example Usage
let client = get_qdrant_client(&state)?;
// Create collection
client.create_collection("kb_bot123_enrollpdfs", 1536).await?;
// Search
let results = client.search("kb_bot123_enrollpdfs", query_vector, 5).await?;
4. Embeddings Generator (src/kb/embeddings.rs)
Handles text embedding and document indexing:
- Chunking: Splits documents into 512-character chunks with 50-char overlap
- Embedding: Generates vectors using local LLM server
- Indexing: Stores chunks with metadata in Qdrant
Document Processing
// Index a document
index_document(&state, "kb_bot_collection", "file.pdf", &content).await?;
// Search for similar documents
let results = search_similar(&state, "kb_bot_collection", "query", 5).await?;
5. BASIC Compiler (src/basic/compiler/mod.rs)
Compiles BASIC scripts and generates tool definitions:
Input: BASIC Script with Metadata
PARAM name AS string LIKE "Abreu Silva" DESCRIPTION "Required full name"
PARAM birthday AS date LIKE "23/09/2001" DESCRIPTION "Birth date in DD/MM/YYYY"
PARAM email AS string LIKE "user@example.com" DESCRIPTION "Email address"
DESCRIPTION "Enrollment process for new users"
// Script logic here
SAVE "enrollments.csv", id, name, birthday, email
TALK "Thanks, you are enrolled!"
SET_KB "enrollpdfs"
Output: Multiple Files
- enrollment.ast: Compiled Rhai AST
- enrollment.mcp.json: MCP tool definition
- enrollment.tool.json: OpenAI tool definition
MCP Tool Format
{
"name": "enrollment",
"description": "Enrollment process for new users",
"input_schema": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Required full name",
"example": "Abreu Silva"
},
"birthday": {
"type": "string",
"description": "Birth date in DD/MM/YYYY",
"example": "23/09/2001"
}
},
"required": ["name", "birthday", "email"]
}
}
6. Prompt Processor (src/context/prompt_processor.rs)
Enhances queries with context based on answer mode:
Answer Modes
| Mode | Value | Description |
|---|---|---|
| Direct | 0 | No additional context, direct LLM response |
| WithTools | 1 | Include available tools in prompt |
| DocumentsOnly | 2 | Search KB only, no LLM generation |
| WebSearch | 3 | Include web search results |
| Mixed | 4 | Combine KB documents + tools (context-aware) |
Mixed Mode Flow
User Query
│
▼
┌─────────────────────────┐
│ Prompt Processor │
│ (Answer Mode: Mixed) │
└─────────────────────────┘
│
├──► Search KB Documents (Qdrant)
│ └─► Returns relevant chunks
│
├──► Get Available Tools (Session Context)
│ └─► Returns tool definitions
│
▼
┌─────────────────────────┐
│ Enhanced Prompt │
│ • System Prompt │
│ • Document Context │
│ • Available Tools │
│ • User Query │
└─────────────────────────┘
BASIC Keywords
SET_KB
Activates a KB collection for the current session.
SET_KB "enrollpdfs"
- Creates/ensures Qdrant collection exists
- Updates session context with active collection
- Documents in
.gbkb/enrollpdfs/are indexed
ADD_KB
Adds an additional KB collection (can have multiple).
ADD_KB "productbrochurespdfsanddocs"
ADD_TOOL
Compiles and registers a BASIC tool.
ADD_TOOL "enrollment.bas"
Downloads from MinIO (.gbdialog/enrollment.bas), compiles to:
./work/{bot_id}.gbai/{bot_id}.gbdialog/enrollment.ast./work/{bot_id}.gbai/{bot_id}.gbdialog/enrollment.mcp.json./work/{bot_id}.gbai/{bot_id}.gbdialog/enrollment.tool.json
With MCP Endpoint
ADD_TOOL "enrollment.bas" as MCP
Creates an HTTP endpoint at /default/enrollment that:
- Accepts JSON matching the tool schema
- Executes the BASIC script
- Returns the result
ADD_WEBSITE
Crawls and indexes a website for the current session.
ADD_WEBSITE "https://example.com/docs"
- Fetches HTML content
- Extracts readable text (removes scripts, styles)
- Creates temporary Qdrant collection
- Indexes content with embeddings
- Available for remainder of session
Database Schema
kb_documents
Stores metadata about indexed documents:
CREATE TABLE kb_documents (
id UUID PRIMARY KEY,
bot_id UUID NOT NULL,
collection_name TEXT NOT NULL,
file_path TEXT NOT NULL,
file_size BIGINT NOT NULL,
file_hash TEXT NOT NULL,
first_published_at TIMESTAMPTZ NOT NULL,
last_modified_at TIMESTAMPTZ NOT NULL,
indexed_at TIMESTAMPTZ,
metadata JSONB DEFAULT '{}',
UNIQUE(bot_id, collection_name, file_path)
);
kb_collections
Stores KB collection information:
CREATE TABLE kb_collections (
id UUID PRIMARY KEY,
bot_id UUID NOT NULL,
name TEXT NOT NULL,
folder_path TEXT NOT NULL,
qdrant_collection TEXT NOT NULL,
document_count INTEGER NOT NULL DEFAULT 0,
UNIQUE(bot_id, name)
);
basic_tools
Stores compiled BASIC tools:
CREATE TABLE basic_tools (
id UUID PRIMARY KEY,
bot_id UUID NOT NULL,
tool_name TEXT NOT NULL,
file_path TEXT NOT NULL,
ast_path TEXT NOT NULL,
mcp_json JSONB,
tool_json JSONB,
compiled_at TIMESTAMPTZ NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT true,
UNIQUE(bot_id, tool_name)
);
Workflow Examples
Example 1: Enrollment with KB
File Structure:
bot.gbai/
├── .gbkb/
│ └── enrollpdfs/
│ ├── enrollment_guide.pdf
│ ├── requirements.pdf
│ └── faq.pdf
├── .gbdialog/
│ ├── start.bas
│ └── enrollment.bas
start.bas:
ADD_TOOL "enrollment.bas" as MCP
ADD_KB "enrollpdfs"
enrollment.bas:
PARAM name AS string LIKE "John Doe" DESCRIPTION "Full name"
PARAM email AS string LIKE "john@example.com" DESCRIPTION "Email"
DESCRIPTION "Enrollment process with KB support"
// Validate input
IF name = "" THEN
TALK "Please provide your name"
EXIT
END IF
// Save to database
SAVE "enrollments.csv", name, email
// Set KB for enrollment docs
SET_KB "enrollpdfs"
TALK "Thanks! You can now ask me about enrollment procedures."
User Interaction:
- User: "I want to enroll"
- Bot calls
enrollmenttool, collects parameters - After enrollment, SET_KB activates
enrollpdfscollection - User: "What documents do I need?"
- Bot searches KB (mode=2 or 4), finds relevant PDFs, responds with info
Example 2: Product Support with Web Content
support.bas:
PARAM product AS string LIKE "fax" DESCRIPTION "Product name"
DESCRIPTION "Get product information"
// Find in database
price = -1
productRecord = FIND "products.csv", "name = ${product}"
IF productRecord THEN
price = productRecord.price
END IF
// Add product documentation website
ADD_WEBSITE "https://example.com/products/${product}"
// Add product brochures KB
SET_KB "productbrochurespdfsanddocs"
RETURN price
User Flow:
- User: "What's the price of a fax machine?"
- Tool executes, finds price in CSV
- ADD_WEBSITE indexes product page
- SET_KB activates brochures collection
- User: "How do I set it up?"
- Prompt processor (Mixed mode) searches both:
- Temporary website collection
- Product brochures KB
- Returns setup instructions from indexed sources
Configuration
Environment Variables
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
# LLM for Embeddings
LLM_URL=http://localhost:8081
# MinIO Configuration (from config)
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_ORG_PREFIX=org1_
# Database
DATABASE_URL=postgresql://user:pass@localhost/botserver
Answer Mode Selection
Set in session's answer_mode field:
// Example: Update session to Mixed mode
session.answer_mode = 4;
Or via API when creating session:
POST /sessions
{
"user_id": "...",
"bot_id": "...",
"answer_mode": 4
}
Security Considerations
- Path Traversal Protection: All file paths validated to prevent
..attacks - Safe Tool Paths: Tools must be in
.gbdialog/folder - URL Validation: ADD_WEBSITE only allows HTTP/HTTPS URLs
- Bucket Isolation: Each organization has separate MinIO bucket
- Hash Verification: File changes detected by SHA256 hash
Performance Tuning
KB Manager
- Poll Interval: 30 seconds (adjustable in
kb/mod.rs) - Chunk Size: 512 characters (in
kb/embeddings.rs) - Chunk Overlap: 50 characters
MinIO Handler
- Poll Interval: 15 seconds (adjustable in
kb/minio_handler.rs) - State Caching: File states cached in memory
Qdrant
- Vector Size: 1536 (OpenAI ada-002 compatible)
- Distance Metric: Cosine similarity
- Search Limit: Configurable per query
Troubleshooting
Documents Not Indexing
-
Check MinIO handler is watching correct prefix:
minio_handler.watch_prefix(".gbkb/").await; -
Verify Qdrant connection:
curl http://localhost:6333/collections -
Check logs for indexing errors:
grep "Indexing document" botserver.log
Tools Not Compiling
- Verify PARAM syntax is correct
- Check tool file is in
.gbdialog/folder - Ensure work directory exists and is writable
- Review compilation logs
KB Search Not Working
- Verify collection exists in session context
- Check Qdrant collection created:
curl http://localhost:6333/collections/{collection_name} - Ensure embeddings are being generated (check LLM server)
Future Enhancements
- Incremental Indexing: Only reindex changed chunks
- Document Deduplication: Detect and merge duplicate content
- Advanced Crawling: Follow links, handle JavaScript
- Tool Versioning: Track tool versions and changes
- KB Analytics: Track search queries and document usage
- Automatic Tool Discovery: Scan
.gbdialog/on startup - Distributed Indexing: Scale across multiple workers
- Real-time Notifications: WebSocket updates when KB changes
References
- Qdrant Documentation: https://qdrant.tech/documentation/
- Model Context Protocol: https://modelcontextprotocol.io/
- MinIO Documentation: https://min.io/docs/
- Rhai Scripting: https://rhai.rs/book/
Support
For issues or questions:
- GitHub Issues: https://github.com/GeneralBots/BotServer/issues
- Documentation: https://docs.generalbots.ai/