botserver/docs/src/chapter-02/gbkb.md

81 lines
1.9 KiB
Markdown
Raw Normal View History

# .gbkb Knowledge Base
The `.gbkb` package manages knowledge base collections that provide contextual information to the bot during conversations.
## What is .gbkb?
`.gbkb` (General Bot Knowledge Base) collections store:
- Document collections for semantic search
- Vector embeddings for similarity matching
- Metadata and indexing information
- Access control and organization
## Knowledge Base Structure
Each `.gbkb` collection is organized as:
```
collection-name.gbkb/
2025-11-23 17:02:22 -03:00
documents/
doc1.pdf
doc2.txt
doc3.html
embeddings/ # Auto-generated
metadata.json # Collection info
index.json # Search indexes
```
## Supported Formats
The knowledge base can process:
- **Text files**: .txt, .md, .html
- **Documents**: .pdf, .docx
- **Web content**: URLs and web pages
- **Structured data**: .csv, .json
## Vector Embeddings
Each document is processed into vector embeddings using:
- BGE-small-en-v1.5 model (default)
- Chunking for large documents
- Metadata extraction and indexing
- Semantic similarity scoring
## Collection Management
### Creating Collections
```basic
2025-11-23 13:46:55 -03:00
USE KB "company-policies"
USE WEBSITE "https://company.com/docs"
```
### Using Collections
```basic
2025-11-23 13:46:55 -03:00
USE KB "company-policies"
2025-11-23 20:12:09 -03:00
TALK "What would you like to know about company policies?"
' The system AI will search the KB automatically when responding
```
### Multiple Collections
```basic
2025-11-23 13:46:55 -03:00
USE KB "policies"
USE KB "procedures"
USE KB "faqs"
REM All active collections contribute to context
```
## Semantic Search
The knowledge base provides:
- **Similarity search**: Find relevant documents
- **Hybrid search**: Combine semantic and keyword
- **Context injection**: Automatically add to LLM prompts
- **Relevance scoring**: Filter by similarity threshold
## Integration with Dialogs
Knowledge bases are automatically used when:
2025-11-23 13:46:55 -03:00
- `USE KB` is called
- Answer mode is set to use documents
- LLM queries benefit from contextual information