botserver/docs/src/chapter-02/gbkb.md

1.9 KiB

.gbkb Knowledge Base

The .gbkb package manages knowledge base collections that provide contextual information to the bot during conversations.

What is .gbkb?

.gbkb (General Bot Knowledge Base) collections store:

  • Document collections for semantic search
  • Vector embeddings for similarity matching
  • Metadata and indexing information
  • Access control and organization

Knowledge Base Structure

Each .gbkb collection is organized as:

collection-name.gbkb/
├── documents/
│   ├── doc1.pdf
│   ├── doc2.txt
│   └── doc3.html
├── embeddings/          # Auto-generated
├── metadata.json       # Collection info
└── index.json         # Search indexes

Supported Formats

The knowledge base can process:

  • Text files: .txt, .md, .html
  • Documents: .pdf, .docx
  • Web content: URLs and web pages
  • Structured data: .csv, .json

Vector Embeddings

Each document is processed into vector embeddings using:

  • BGE-small-en-v1.5 model (default)
  • Chunking for large documents
  • Metadata extraction and indexing
  • Semantic similarity scoring

Collection Management

Creating Collections

USE_KB "company-policies"
ADD_WEBSITE "https://company.com/docs"

Using Collections

USE_KB "company-policies"
LLM "What is the vacation policy?"

Multiple Collections

USE_KB "policies"
USE_KB "procedures"
USE_KB "faqs"
REM All active collections contribute to context

The knowledge base provides:

  • Similarity search: Find relevant documents
  • Hybrid search: Combine semantic and keyword
  • Context injection: Automatically add to LLM prompts
  • Relevance scoring: Filter by similarity threshold

Integration with Dialogs

Knowledge bases are automatically used when:

  • USE_KB is called
  • Answer mode is set to use documents
  • LLM queries benefit from contextual information