botserver/docs/src/chapter-03/semantic-search.md

122 lines
3.7 KiB
Markdown
Raw Normal View History

# Semantic Search
2025-11-23 20:12:09 -03:00
Semantic search in BotServer happens automatically when you use `USE KB`. The system searches for relevant information based on meaning, not just keywords, and makes it available to the system AI during conversations.
2025-11-24 14:15:01 -03:00
## Search Pipeline
<img src="./assets/search-pipeline.svg" alt="Semantic Search Pipeline" style="max-height: 500px; width: 100%; object-fit: contain;">
2025-11-23 13:46:55 -03:00
## How It Works Automatically
2025-11-23 13:46:55 -03:00
1. **User asks a question** - Natural language input
2. **Query converted to vector** - Using the embedding model
3. **Search active collections** - Finds semantically similar content
2025-11-23 20:12:09 -03:00
4. **Inject into context** - Relevant chunks provided to system AI
5. **Generate response** - System AI answers using the knowledge
2025-11-23 13:46:55 -03:00
## Activating Semantic Search
Simply use `USE KB` to enable search for a collection:
```basic
2025-11-23 13:46:55 -03:00
USE KB "policies"
USE KB "procedures"
' Both collections are now searchable
' No explicit search commands needed
```
2025-11-23 20:12:09 -03:00
When users ask questions, the system automatically searches these collections and provides relevant context to the system AI.
2025-11-23 13:46:55 -03:00
## How Meaning-Based Search Works
2025-11-23 13:46:55 -03:00
Unlike keyword search, semantic search understands meaning:
2025-11-23 13:46:55 -03:00
- "How many days off do I get?" matches "vacation policy"
- "What's the return policy?" matches "refund procedures"
- "I'm feeling sick" matches "medical leave guidelines"
The system uses vector embeddings to find conceptually similar content, even when exact words don't match.
## Configuration
Search behavior is controlled by `config.csv`:
```csv
prompt-history,2 # How many previous messages to include
prompt-compact,4 # Compact context after N exchanges
```
2025-11-23 20:12:09 -03:00
These settings manage how much context the system AI receives, not the search itself.
2025-11-23 13:46:55 -03:00
## Multiple Collections
When multiple collections are active, the system searches all of them:
```basic
USE KB "products"
USE KB "support"
USE KB "warranty"
' User: "My laptop won't turn on"
' System searches all three collections for relevant info
```
## Search Quality
The quality of semantic search depends on:
- **Document organization** - Well-structured folders help
- **Embedding model** - BGE model works well, can be replaced
- **Content quality** - Clear, descriptive documents work best
2025-11-23 13:46:55 -03:00
## Real Example
```basic
' In start.bas
USE KB "company-handbook"
' User types: "What's the dress code?"
' System automatically:
' 1. Searches company-handbook for dress code info
' 2. Finds relevant sections about attire
' 3. Injects them into LLM context
' 4. LLM generates natural response with the information
```
## Performance
2025-11-23 13:46:55 -03:00
- Search happens in milliseconds
- No configuration needed
- Cached for repeated queries
- Only active collections are searched
## Best Practices
1. **Activate only needed collections** - Don't overload context
2. **Organize content well** - One topic per folder
3. **Use descriptive text** - Helps with matching
4. **Keep documents updated** - Fresh content = better answers
## Common Misconceptions
**Wrong**: You need to call a search function
**Right**: Search happens automatically with `USE KB`
**Wrong**: You need to configure search parameters
**Right**: It works out of the box
**Wrong**: You need special commands to query
**Right**: Users just ask questions naturally
## Troubleshooting
### Not finding relevant content?
- Check the collection is activated with `USE KB`
- Verify documents are in the right folder
- Ensure content is descriptive
### Too much irrelevant content?
- Use fewer collections simultaneously
- Organize documents into more specific folders
- Clear unused collections with `CLEAR KB`
Remember: The beauty of semantic search in BotServer is its simplicity - just `USE KB` and let the system handle the rest!