botserver/docs/KB_AND_TOOLS.md

16 KiB

Knowledge Base (KB) and Tools System

Overview

This document describes the comprehensive Knowledge Base (KB) and BASIC Tools compilation system integrated into the botserver. This system enables:

  1. Dynamic Knowledge Base Management: Monitor MinIO buckets for document changes and automatically index them in Qdrant vector database
  2. BASIC Tool Compilation: Compile BASIC scripts into AST and generate MCP/OpenAI tool definitions
  3. Intelligent Context Processing: Enhance prompts with relevant KB documents and available tools based on answer mode
  4. Temporary Website Indexing: Crawl and index web pages for session-specific knowledge

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Bot Server                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
│  │ KB Manager   │    │ MinIO        │    │ Qdrant       │      │
│  │              │◄──►│ Handler      │◄──►│ Client       │      │
│  │              │    │              │    │              │      │
│  └──────────────┘    └──────────────┘    └──────────────┘      │
│         │                                        ▲               │
│         │                                        │               │
│         ▼                                        │               │
│  ┌──────────────┐                        ┌──────────────┐      │
│  │ BASIC        │                        │ Embeddings   │      │
│  │ Compiler     │                        │ Generator    │      │
│  │              │                        │              │      │
│  └──────────────┘                        └──────────────┘      │
│         │                                                        │
│         ▼                                                        │
│  ┌──────────────────────────────────────────────────────┐      │
│  │         Prompt Processor                              │      │
│  │  (Integrates KB + Tools based on Answer Mode)        │      │
│  └──────────────────────────────────────────────────────┘      │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Components

1. KB Manager (src/kb/mod.rs)

The KB Manager coordinates MinIO monitoring and Qdrant indexing:

  • Watches collections: Monitors .gbkb/ folders for document changes
  • Detects changes: Uses file hashing (SHA256) to detect modified files
  • Indexes documents: Splits documents into chunks and generates embeddings
  • Stores metadata: Maintains document information in PostgreSQL

Key Functions

// Add a KB collection to be monitored
kb_manager.add_collection(bot_id, "enrollpdfs").await?;

// Remove a collection
kb_manager.remove_collection("enrollpdfs").await?;

// Start the monitoring service
let kb_handle = kb_manager.spawn();

2. MinIO Handler (src/kb/minio_handler.rs)

Monitors MinIO buckets for file changes:

  • Polling: Checks for changes every 15 seconds
  • Event detection: Identifies created, modified, and deleted files
  • State tracking: Maintains file ETags and sizes for change detection

File Change Events

pub enum FileChangeEvent {
    Created { path: String, size: i64, etag: String },
    Modified { path: String, size: i64, etag: String },
    Deleted { path: String },
}

3. Qdrant Client (src/kb/qdrant_client.rs)

Manages vector database operations:

  • Collection management: Create, delete, and check collections
  • Point operations: Upsert and delete vector points
  • Search: Semantic search using cosine similarity

Example Usage

let client = get_qdrant_client(&state)?;

// Create collection
client.create_collection("kb_bot123_enrollpdfs", 1536).await?;

// Search
let results = client.search("kb_bot123_enrollpdfs", query_vector, 5).await?;

4. Embeddings Generator (src/kb/embeddings.rs)

Handles text embedding and document indexing:

  • Chunking: Splits documents into 512-character chunks with 50-char overlap
  • Embedding: Generates vectors using local LLM server
  • Indexing: Stores chunks with metadata in Qdrant

Document Processing

// Index a document
index_document(&state, "kb_bot_collection", "file.pdf", &content).await?;

// Search for similar documents
let results = search_similar(&state, "kb_bot_collection", "query", 5).await?;

5. BASIC Compiler (src/basic/compiler/mod.rs)

Compiles BASIC scripts and generates tool definitions:

Input: BASIC Script with Metadata

PARAM name AS string LIKE "Abreu Silva" DESCRIPTION "Required full name"
PARAM birthday AS date LIKE "23/09/2001" DESCRIPTION "Birth date in DD/MM/YYYY"
PARAM email AS string LIKE "user@example.com" DESCRIPTION "Email address"

DESCRIPTION "Enrollment process for new users"

// Script logic here
SAVE "enrollments.csv", id, name, birthday, email
TALK "Thanks, you are enrolled!"
SET_KB "enrollpdfs"

Output: Multiple Files

  1. enrollment.ast: Compiled Rhai AST
  2. enrollment.mcp.json: MCP tool definition
  3. enrollment.tool.json: OpenAI tool definition

MCP Tool Format

{
  "name": "enrollment",
  "description": "Enrollment process for new users",
  "input_schema": {
    "type": "object",
    "properties": {
      "name": {
        "type": "string",
        "description": "Required full name",
        "example": "Abreu Silva"
      },
      "birthday": {
        "type": "string",
        "description": "Birth date in DD/MM/YYYY",
        "example": "23/09/2001"
      }
    },
    "required": ["name", "birthday", "email"]
  }
}

6. Prompt Processor (src/context/prompt_processor.rs)

Enhances queries with context based on answer mode:

Answer Modes

Mode Value Description
Direct 0 No additional context, direct LLM response
WithTools 1 Include available tools in prompt
DocumentsOnly 2 Search KB only, no LLM generation
WebSearch 3 Include web search results
Mixed 4 Combine KB documents + tools (context-aware)

Mixed Mode Flow

User Query
    │
    ▼
┌─────────────────────────┐
│  Prompt Processor       │
│  (Answer Mode: Mixed)   │
└─────────────────────────┘
    │
    ├──► Search KB Documents (Qdrant)
    │    └─► Returns relevant chunks
    │
    ├──► Get Available Tools (Session Context)
    │    └─► Returns tool definitions
    │
    ▼
┌─────────────────────────┐
│  Enhanced Prompt        │
│  • System Prompt        │
│  • Document Context     │
│  • Available Tools      │
│  • User Query           │
└─────────────────────────┘

BASIC Keywords

SET_KB

Activates a KB collection for the current session.

SET_KB "enrollpdfs"
  • Creates/ensures Qdrant collection exists
  • Updates session context with active collection
  • Documents in .gbkb/enrollpdfs/ are indexed

ADD_KB

Adds an additional KB collection (can have multiple).

ADD_KB "productbrochurespdfsanddocs"

ADD_TOOL

Compiles and registers a BASIC tool.

ADD_TOOL "enrollment.bas"

Downloads from MinIO (.gbdialog/enrollment.bas), compiles to:

  • ./work/{bot_id}.gbai/{bot_id}.gbdialog/enrollment.ast
  • ./work/{bot_id}.gbai/{bot_id}.gbdialog/enrollment.mcp.json
  • ./work/{bot_id}.gbai/{bot_id}.gbdialog/enrollment.tool.json

With MCP Endpoint

ADD_TOOL "enrollment.bas" as MCP

Creates an HTTP endpoint at /default/enrollment that:

  • Accepts JSON matching the tool schema
  • Executes the BASIC script
  • Returns the result

ADD_WEBSITE

Crawls and indexes a website for the current session.

ADD_WEBSITE "https://example.com/docs"
  • Fetches HTML content
  • Extracts readable text (removes scripts, styles)
  • Creates temporary Qdrant collection
  • Indexes content with embeddings
  • Available for remainder of session

Database Schema

kb_documents

Stores metadata about indexed documents:

CREATE TABLE kb_documents (
    id UUID PRIMARY KEY,
    bot_id UUID NOT NULL,
    collection_name TEXT NOT NULL,
    file_path TEXT NOT NULL,
    file_size BIGINT NOT NULL,
    file_hash TEXT NOT NULL,
    first_published_at TIMESTAMPTZ NOT NULL,
    last_modified_at TIMESTAMPTZ NOT NULL,
    indexed_at TIMESTAMPTZ,
    metadata JSONB DEFAULT '{}',
    UNIQUE(bot_id, collection_name, file_path)
);

kb_collections

Stores KB collection information:

CREATE TABLE kb_collections (
    id UUID PRIMARY KEY,
    bot_id UUID NOT NULL,
    name TEXT NOT NULL,
    folder_path TEXT NOT NULL,
    qdrant_collection TEXT NOT NULL,
    document_count INTEGER NOT NULL DEFAULT 0,
    UNIQUE(bot_id, name)
);

basic_tools

Stores compiled BASIC tools:

CREATE TABLE basic_tools (
    id UUID PRIMARY KEY,
    bot_id UUID NOT NULL,
    tool_name TEXT NOT NULL,
    file_path TEXT NOT NULL,
    ast_path TEXT NOT NULL,
    mcp_json JSONB,
    tool_json JSONB,
    compiled_at TIMESTAMPTZ NOT NULL,
    is_active BOOLEAN NOT NULL DEFAULT true,
    UNIQUE(bot_id, tool_name)
);

Workflow Examples

Example 1: Enrollment with KB

File Structure:

bot.gbai/
├── .gbkb/
│   └── enrollpdfs/
│       ├── enrollment_guide.pdf
│       ├── requirements.pdf
│       └── faq.pdf
├── .gbdialog/
│   ├── start.bas
│   └── enrollment.bas

start.bas:

ADD_TOOL "enrollment.bas" as MCP
ADD_KB "enrollpdfs"

enrollment.bas:

PARAM name AS string LIKE "John Doe" DESCRIPTION "Full name"
PARAM email AS string LIKE "john@example.com" DESCRIPTION "Email"

DESCRIPTION "Enrollment process with KB support"

// Validate input
IF name = "" THEN
    TALK "Please provide your name"
    EXIT
END IF

// Save to database
SAVE "enrollments.csv", name, email

// Set KB for enrollment docs
SET_KB "enrollpdfs"

TALK "Thanks! You can now ask me about enrollment procedures."

User Interaction:

  1. User: "I want to enroll"
  2. Bot calls enrollment tool, collects parameters
  3. After enrollment, SET_KB activates enrollpdfs collection
  4. User: "What documents do I need?"
  5. Bot searches KB (mode=2 or 4), finds relevant PDFs, responds with info

Example 2: Product Support with Web Content

support.bas:

PARAM product AS string LIKE "fax" DESCRIPTION "Product name"

DESCRIPTION "Get product information"

// Find in database
price = -1
productRecord = FIND "products.csv", "name = ${product}"
IF productRecord THEN
    price = productRecord.price
END IF

// Add product documentation website
ADD_WEBSITE "https://example.com/products/${product}"

// Add product brochures KB
SET_KB "productbrochurespdfsanddocs"

RETURN price

User Flow:

  1. User: "What's the price of a fax machine?"
  2. Tool executes, finds price in CSV
  3. ADD_WEBSITE indexes product page
  4. SET_KB activates brochures collection
  5. User: "How do I set it up?"
  6. Prompt processor (Mixed mode) searches both:
    • Temporary website collection
    • Product brochures KB
  7. Returns setup instructions from indexed sources

Configuration

Environment Variables

# Qdrant Configuration
QDRANT_URL=http://localhost:6333

# LLM for Embeddings
LLM_URL=http://localhost:8081

# MinIO Configuration (from config)
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_ORG_PREFIX=org1_

# Database
DATABASE_URL=postgresql://user:pass@localhost/botserver

Answer Mode Selection

Set in session's answer_mode field:

// Example: Update session to Mixed mode
session.answer_mode = 4;

Or via API when creating session:

POST /sessions
{
  "user_id": "...",
  "bot_id": "...",
  "answer_mode": 4
}

Security Considerations

  1. Path Traversal Protection: All file paths validated to prevent .. attacks
  2. Safe Tool Paths: Tools must be in .gbdialog/ folder
  3. URL Validation: ADD_WEBSITE only allows HTTP/HTTPS URLs
  4. Bucket Isolation: Each organization has separate MinIO bucket
  5. Hash Verification: File changes detected by SHA256 hash

Performance Tuning

KB Manager

  • Poll Interval: 30 seconds (adjustable in kb/mod.rs)
  • Chunk Size: 512 characters (in kb/embeddings.rs)
  • Chunk Overlap: 50 characters

MinIO Handler

  • Poll Interval: 15 seconds (adjustable in kb/minio_handler.rs)
  • State Caching: File states cached in memory

Qdrant

  • Vector Size: 1536 (OpenAI ada-002 compatible)
  • Distance Metric: Cosine similarity
  • Search Limit: Configurable per query

Troubleshooting

Documents Not Indexing

  1. Check MinIO handler is watching correct prefix:

    minio_handler.watch_prefix(".gbkb/").await;
    
  2. Verify Qdrant connection:

    curl http://localhost:6333/collections
    
  3. Check logs for indexing errors:

    grep "Indexing document" botserver.log
    

Tools Not Compiling

  1. Verify PARAM syntax is correct
  2. Check tool file is in .gbdialog/ folder
  3. Ensure work directory exists and is writable
  4. Review compilation logs

KB Search Not Working

  1. Verify collection exists in session context
  2. Check Qdrant collection created:
    curl http://localhost:6333/collections/{collection_name}
    
  3. Ensure embeddings are being generated (check LLM server)

Future Enhancements

  1. Incremental Indexing: Only reindex changed chunks
  2. Document Deduplication: Detect and merge duplicate content
  3. Advanced Crawling: Follow links, handle JavaScript
  4. Tool Versioning: Track tool versions and changes
  5. KB Analytics: Track search queries and document usage
  6. Automatic Tool Discovery: Scan .gbdialog/ on startup
  7. Distributed Indexing: Scale across multiple workers
  8. Real-time Notifications: WebSocket updates when KB changes

References

Support

For issues or questions: