1.9 KiB
1.9 KiB
.gbkb Knowledge Base
The .gbkb package manages knowledge base collections that provide contextual information to the bot during conversations.
What is .gbkb?
.gbkb (General Bot Knowledge Base) collections store:
- Document collections for semantic search
- Vector embeddings for similarity matching
- Metadata and indexing information
- Access control and organization
Knowledge Base Structure
Each .gbkb collection is organized as:
collection-name.gbkb/
├── documents/
│ ├── doc1.pdf
│ ├── doc2.txt
│ └── doc3.html
├── embeddings/ # Auto-generated
├── metadata.json # Collection info
└── index.json # Search indexes
Supported Formats
The knowledge base can process:
- Text files: .txt, .md, .html
- Documents: .pdf, .docx
- Web content: URLs and web pages
- Structured data: .csv, .json
Vector Embeddings
Each document is processed into vector embeddings using:
- BGE-small-en-v1.5 model (default)
- Chunking for large documents
- Metadata extraction and indexing
- Semantic similarity scoring
Collection Management
Creating Collections
USE_KB "company-policies"
ADD_WEBSITE "https://company.com/docs"
Using Collections
USE_KB "company-policies"
LLM "What is the vacation policy?"
Multiple Collections
USE_KB "policies"
USE_KB "procedures"
USE_KB "faqs"
REM All active collections contribute to context
Semantic Search
The knowledge base provides:
- Similarity search: Find relevant documents
- Hybrid search: Combine semantic and keyword
- Context injection: Automatically add to LLM prompts
- Relevance scoring: Filter by similarity threshold
Integration with Dialogs
Knowledge bases are automatically used when:
USE_KBis called- Answer mode is set to use documents
- LLM queries benefit from contextual information