# Chapter 03: Knowledge Base System Vector search and semantic retrieval for intelligent document querying. ## Overview The Knowledge Base (gbkb) transforms documents into searchable semantic representations, enabling natural language queries against your organization's content. ## Architecture KB Architecture Pipeline The pipeline processes documents through extraction, chunking, embedding, and storage to enable semantic search. ## Supported Formats | Format | Features | |--------|----------| | PDF | Text, OCR, tables | | DOCX | Formatted text, styles | | HTML | DOM parsing | | Markdown | GFM, tables, code | | CSV/JSON | Structured data | | TXT | Plain text | ## Quick Start ```basic ' Activate knowledge base USE KB "company-docs" ' Bot now answers from your documents TALK "How can I help you?" ``` ## Key Concepts ### Document Processing 1. **Extract** - Pull text from files 2. **Chunk** - Split into ~500 token segments 3. **Embed** - Generate vectors (BGE model) 4. **Store** - Save to Qdrant ### Semantic Search - Query converted to vector embedding - Cosine similarity finds relevant chunks - Top results injected into LLM context - No explicit search code needed ### Storage Requirements Vector databases need ~3.5x original document size: - Embeddings: ~2x - Indexes: ~1x - Metadata: ~0.5x ## Configuration ```csv name,value embedding-url,http://localhost:8082 embedding-model,bge-small-en-v1.5 rag-hybrid-enabled,true rag-top-k,10 ``` ## Chapter Contents - [KB and Tools System](./kb-and-tools.md) - Integration patterns - [Vector Collections](./vector-collections.md) - Collection management - [Document Indexing](./indexing.md) - Processing pipeline - [Semantic Search](./semantic-search.md) - Search mechanics - [Episodic Memory](./episodic-memory.md) - Conversation history and context management - [Semantic Caching](./caching.md) - Performance optimization ## See Also - [.gbkb Package](../chapter-02/gbkb.md) - Folder structure - [USE KB Keyword](../chapter-06-gbdialog/keyword-use-kb.md) - Keyword reference - [Hybrid Search](../chapter-11-features/hybrid-search.md) - RAG 2.0