# .gbkb Knowledge Base The `.gbkb` package manages knowledge base collections that provide contextual information to the bot during conversations. ## What is .gbkb? `.gbkb` (General Bot Knowledge Base) collections store: - Document collections for semantic search - Vector embeddings for similarity matching - Metadata and indexing information - Access control and organization ## Knowledge Base Structure Each `.gbkb` collection is organized as: ``` collection-name.gbkb/ ├── documents/ │ ├── doc1.pdf │ ├── doc2.txt │ └── doc3.html ├── embeddings/ # Auto-generated ├── metadata.json # Collection info └── index.json # Search indexes ``` ## Supported Formats The knowledge base can process: - **Text files**: .txt, .md, .html - **Documents**: .pdf, .docx - **Web content**: URLs and web pages - **Structured data**: .csv, .json ## Vector Embeddings Each document is processed into vector embeddings using: - BGE-small-en-v1.5 model (default) - Chunking for large documents - Metadata extraction and indexing - Semantic similarity scoring ## Collection Management ### Creating Collections ```basic ADD_KB "company-policies" ADD_WEBSITE "https://company.com/docs" ``` ### Using Collections ```basic SET_KB "company-policies" LLM "What is the vacation policy?" ``` ### Multiple Collections ```basic ADD_KB "policies" ADD_KB "procedures" ADD_KB "faqs" REM All active collections contribute to context ``` ## Semantic Search The knowledge base provides: - **Similarity search**: Find relevant documents - **Hybrid search**: Combine semantic and keyword - **Context injection**: Automatically add to LLM prompts - **Relevance scoring**: Filter by similarity threshold ## Integration with Dialogs Knowledge bases are automatically used when: - `SET_KB` or `ADD_KB` is called - Answer mode is set to use documents - LLM queries benefit from contextual information