botbook/src/03-knowledge-base
2025-12-06 11:09:12 -03:00
..
assets Update: General project updates 2025-12-06 11:09:12 -03:00
caching.md Update: General project updates 2025-12-06 11:09:12 -03:00
episodic-memory.md Update: General project updates 2025-12-06 11:09:12 -03:00
indexing.md Update: General project updates 2025-12-06 11:09:12 -03:00
kb-and-tools.md Update: General project updates 2025-12-06 11:09:12 -03:00
README.md Update: General project updates 2025-12-06 11:09:12 -03:00
semantic-search.md Update: General project updates 2025-12-06 11:09:12 -03:00
summary.md Update: General project updates 2025-12-06 11:09:12 -03:00
vector-collections.md Update: General project updates 2025-12-06 11:09:12 -03:00

Chapter 03: Knowledge Base System

Vector search and semantic retrieval for intelligent document querying.

Overview

The Knowledge Base (gbkb) transforms documents into searchable semantic representations, enabling natural language queries against your organization's content.

Architecture

KB Architecture Pipeline

The pipeline processes documents through extraction, chunking, embedding, and storage to enable semantic search.

Supported Formats

Format Features
PDF Text, OCR, tables
DOCX Formatted text, styles
HTML DOM parsing
Markdown GFM, tables, code
CSV/JSON Structured data
TXT Plain text

Quick Start

' Activate knowledge base
USE KB "company-docs"

' Bot now answers from your documents
TALK "How can I help you?"

Key Concepts

Document Processing

  1. Extract - Pull text from files
  2. Chunk - Split into ~500 token segments
  3. Embed - Generate vectors (BGE model)
  4. Store - Save to Qdrant
  • Query converted to vector embedding
  • Cosine similarity finds relevant chunks
  • Top results injected into LLM context
  • No explicit search code needed

Storage Requirements

Vector databases need ~3.5x original document size:

  • Embeddings: ~2x
  • Indexes: ~1x
  • Metadata: ~0.5x

Configuration

name,value
embedding-url,http://localhost:8082
embedding-model,bge-small-en-v1.5
rag-hybrid-enabled,true
rag-top-k,10

Chapter Contents

See Also