Add Scale chapter with sharding and database optimization docs

New Chapter 18 - Scale: - README with overview of scaling architecture - Sharding Architecture: tenant routing, shard config, data model, operations - Database Optimization: SMALLINT enums, indexing, partitioning, connection pooling Enum reference tables for all domain values (Channel, Message, Task, etc.) Best practices for billion-user scale deployments
2025-12-29 12:55:32 -03:00 · 2025-12-29 12:55:32 -03:00 · 6bfab51293
commit 6bfab51293
parent c7bc1355c2
4 changed files with 830 additions and 0 deletions
--- a/src/21-scale/README.md
+++ b/src/21-scale/README.md
@ -0,0 +1,79 @@
+# Scale
+
+This chapter covers horizontal scaling strategies for General Bots to support millions to billions of users across global deployments.
+
+## Overview
+
+General Bots is designed from the ground up to scale horizontally. The architecture supports:
+
+- **Multi-tenancy**: Complete isolation between organizations
+- **Regional sharding**: Data locality for compliance and performance
+- **Database partitioning**: Efficient handling of high-volume tables
+- **Stateless services**: Easy horizontal pod autoscaling
+
+## Chapter Contents
+
+- [Sharding Architecture](./sharding.md) - How data is distributed across shards
+- [Database Optimization](./database-optimization.md) - Schema design for billion-scale
+- [Regional Deployment](./regional-deployment.md) - Multi-region setup
+- [Performance Tuning](./performance-tuning.md) - Optimization strategies
+
+## Key Concepts
+
+### Tenant Isolation
+
+Every piece of data in General Bots is associated with a `tenant_id`. This enables:
+
+1. Complete data isolation between organizations
+2. Per-tenant resource limits and quotas
+3. Tenant-specific configurations
+4. Easy data export/deletion for compliance
+
+### Shard Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                      Load Balancer                          │
+└─────────────────────────┬───────────────────────────────────┘
+                          │
+        ┌─────────────────┼─────────────────┐
+        │                 │                 │
+        ▼                 ▼                 ▼
+   ┌─────────┐       ┌─────────┐       ┌─────────┐
+   │ Region  │       │ Region  │       │ Region  │
+   │   USA   │       │   EUR   │       │   APAC  │
+   └────┬────┘       └────┬────┘       └────┬────┘
+        │                 │                 │
+   ┌────┴────┐       ┌────┴────┐       ┌────┴────┐
+   │ Shard 1 │       │ Shard 2 │       │ Shard 3 │
+   │ Shard 4 │       │ Shard 5 │       │ Shard 6 │
+   └─────────┘       └─────────┘       └─────────┘
+```
+
+### Database Design Principles
+
+1. **SMALLINT enums** instead of VARCHAR for domain values (2 bytes vs 20+ bytes)
+2. **Partitioned tables** for high-volume data (messages, sessions, analytics)
+3. **Composite primary keys** including `shard_id` for distributed queries
+4. **Snowflake-like IDs** for globally unique, time-sortable identifiers
+
+## When to Scale
+
+| Users | Sessions/day | Messages/day | Recommended Setup |
+|-------|--------------|--------------|-------------------|
+| < 10K | < 100K | < 1M | Single node |
+| 10K-100K | 100K-1M | 1M-10M | 2-3 nodes, single region |
+| 100K-1M | 1M-10M | 10M-100M | Multi-node, consider sharding |
+| 1M-10M | 10M-100M | 100M-1B | Regional shards |
+| > 10M | > 100M | > 1B | Global shards with Citus/CockroachDB |
+
+## Quick Start
+
+To enable sharding in your deployment:
+
+1. Configure shard mapping in `shard_config` table
+2. Set `SHARD_ID` environment variable per instance
+3. Deploy region-specific instances
+4. Configure load balancer routing rules
+
+See [Sharding Architecture](./sharding.md) for detailed setup instructions.
--- a/src/21-scale/database-optimization.md
+++ b/src/21-scale/database-optimization.md
@ -0,0 +1,436 @@
+# Database Optimization
+
+This document covers database schema design and optimization strategies for billion-user scale deployments.
+
+## Schema Design Principles
+
+### Use SMALLINT Enums Instead of VARCHAR
+
+One of the most impactful optimizations is using integer enums instead of string-based status fields.
+
+**Before (inefficient):**
+```sql
+CREATE TABLE auto_tasks (
+    id UUID PRIMARY KEY,
+    status VARCHAR(50) NOT NULL DEFAULT 'pending',
+    priority VARCHAR(20) NOT NULL DEFAULT 'normal',
+    execution_mode VARCHAR(50) NOT NULL DEFAULT 'supervised',
+    CONSTRAINT check_status CHECK (status IN ('pending', 'ready', 'running', 'paused', 'waiting_approval', 'completed', 'failed', 'cancelled'))
+);
+```
+
+**After (optimized):**
+```sql
+CREATE TABLE auto_tasks (
+    id UUID PRIMARY KEY,
+    status SMALLINT NOT NULL DEFAULT 0,        -- 2 bytes
+    priority SMALLINT NOT NULL DEFAULT 1,      -- 2 bytes
+    execution_mode SMALLINT NOT NULL DEFAULT 1 -- 2 bytes
+);
+```
+
+### Storage Comparison
+
+| Field Type | Storage | Example Values |
+|------------|---------|----------------|
+| VARCHAR(50) | 1-51 bytes | 'waiting_approval' = 17 bytes |
+| TEXT | 1+ bytes | 'completed' = 10 bytes |
+| SMALLINT | 2 bytes | 4 = 2 bytes (always) |
+| INTEGER | 4 bytes | 4 = 4 bytes (always) |
+
+**Savings per row with 5 enum fields:**
+- VARCHAR: ~50 bytes average
+- SMALLINT: 10 bytes fixed
+- **Savings: 40 bytes per row = 40GB per billion rows**
+
+## Enum Value Reference
+
+All domain values in General Bots use SMALLINT. Reference table:
+
+### Channel Types
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | web | Web chat interface |
+| 1 | whatsapp | WhatsApp Business |
+| 2 | telegram | Telegram Bot |
+| 3 | msteams | Microsoft Teams |
+| 4 | slack | Slack |
+| 5 | email | Email channel |
+| 6 | sms | SMS/Text messages |
+| 7 | voice | Voice/Phone |
+| 8 | instagram | Instagram DM |
+| 9 | api | Direct API |
+
+### Message Role
+| Value | Name | Description |
+|-------|------|-------------|
+| 1 | user | User message |
+| 2 | assistant | Bot response |
+| 3 | system | System prompt |
+| 4 | tool | Tool call/result |
+| 9 | episodic | Episodic memory summary |
+| 10 | compact | Compacted conversation |
+
+### Message Type
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | text | Plain text |
+| 1 | image | Image attachment |
+| 2 | audio | Audio file |
+| 3 | video | Video file |
+| 4 | document | Document/PDF |
+| 5 | location | GPS location |
+| 6 | contact | Contact card |
+| 7 | sticker | Sticker |
+| 8 | reaction | Message reaction |
+
+### LLM Provider
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | openai | OpenAI API |
+| 1 | anthropic | Anthropic Claude |
+| 2 | azure_openai | Azure OpenAI |
+| 3 | azure_claude | Azure Claude |
+| 4 | google | Google AI |
+| 5 | local | Local llama.cpp |
+| 6 | ollama | Ollama |
+| 7 | groq | Groq |
+| 8 | mistral | Mistral AI |
+| 9 | cohere | Cohere |
+
+### Task Status
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | pending | Waiting to start |
+| 1 | ready | Ready to execute |
+| 2 | running | Currently executing |
+| 3 | paused | Paused by user |
+| 4 | waiting_approval | Needs approval |
+| 5 | completed | Successfully finished |
+| 6 | failed | Failed with error |
+| 7 | cancelled | Cancelled by user |
+
+### Task Priority
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | low | Low priority |
+| 1 | normal | Normal priority |
+| 2 | high | High priority |
+| 3 | urgent | Urgent |
+| 4 | critical | Critical |
+
+### Execution Mode
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | manual | Manual execution only |
+| 1 | supervised | Requires approval |
+| 2 | autonomous | Fully automatic |
+
+### Risk Level
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | none | No risk |
+| 1 | low | Low risk |
+| 2 | medium | Medium risk |
+| 3 | high | High risk |
+| 4 | critical | Critical risk |
+
+### Approval Status
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | pending | Awaiting decision |
+| 1 | approved | Approved |
+| 2 | rejected | Rejected |
+| 3 | expired | Timed out |
+| 4 | skipped | Skipped |
+
+### Intent Type
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | unknown | Unclassified |
+| 1 | app_create | Create application |
+| 2 | todo | Create task/reminder |
+| 3 | monitor | Set up monitoring |
+| 4 | action | Execute action |
+| 5 | schedule | Create schedule |
+| 6 | goal | Set goal |
+| 7 | tool | Create tool |
+| 8 | query | Query/search |
+
+### Memory Type
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | short | Short-term |
+| 1 | long | Long-term |
+| 2 | episodic | Episodic |
+| 3 | semantic | Semantic |
+| 4 | procedural | Procedural |
+
+### Sync Status
+| Value | Name | Description |
+|-------|------|-------------|
+| 0 | synced | Fully synced |
+| 1 | pending | Sync pending |
+| 2 | conflict | Conflict detected |
+| 3 | error | Sync error |
+| 4 | deleted | Marked for deletion |
+
+## Indexing Strategies
+
+### Composite Indexes for Common Queries
+
+```sql
+-- Session lookup by user
+CREATE INDEX idx_sessions_user ON user_sessions(user_id, created_at DESC);
+
+-- Messages by session (most common query)
+CREATE INDEX idx_messages_session ON message_history(session_id, message_index);
+
+-- Active tasks by status and priority
+CREATE INDEX idx_tasks_status ON auto_tasks(status, priority) WHERE status < 5;
+
+-- Tenant-scoped queries
+CREATE INDEX idx_sessions_tenant ON user_sessions(tenant_id, created_at DESC);
+```
+
+### Partial Indexes for Active Records
+
+```sql
+-- Only index active bots (saves space)
+CREATE INDEX idx_bots_active ON bots(tenant_id, is_active) WHERE is_active = true;
+
+-- Only index pending approvals
+CREATE INDEX idx_approvals_pending ON task_approvals(task_id, expires_at) WHERE status = 0;
+
+-- Only index unread messages
+CREATE INDEX idx_messages_unread ON message_history(user_id, created_at) WHERE is_read = false;
+```
+
+### BRIN Indexes for Time-Series Data
+
+```sql
+-- BRIN index for time-ordered data (much smaller than B-tree)
+CREATE INDEX idx_messages_created_brin ON message_history USING BRIN (created_at);
+CREATE INDEX idx_analytics_date_brin ON analytics_events USING BRIN (created_at);
+```
+
+## Table Partitioning
+
+### Partition High-Volume Tables by Time
+
+```sql
+-- Partitioned messages table
+CREATE TABLE message_history (
+    id UUID NOT NULL,
+    session_id UUID NOT NULL,
+    tenant_id BIGINT NOT NULL,
+    created_at TIMESTAMPTZ NOT NULL,
+    -- other columns...
+    PRIMARY KEY (id, created_at)
+) PARTITION BY RANGE (created_at);
+
+-- Monthly partitions
+CREATE TABLE message_history_2025_01 PARTITION OF message_history
+    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
+CREATE TABLE message_history_2025_02 PARTITION OF message_history
+    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
+-- ... continue for each month
+
+-- Default partition for future data
+CREATE TABLE message_history_default PARTITION OF message_history DEFAULT;
+```
+
+### Automatic Partition Management
+
+```sql
+-- Function to create next month's partition
+CREATE OR REPLACE FUNCTION create_monthly_partition(
+    table_name TEXT,
+    partition_date DATE
+) RETURNS VOID AS $$
+DECLARE
+    partition_name TEXT;
+    start_date DATE;
+    end_date DATE;
+BEGIN
+    partition_name := table_name || '_' || to_char(partition_date, 'YYYY_MM');
+    start_date := date_trunc('month', partition_date);
+    end_date := start_date + INTERVAL '1 month';
+    
+    EXECUTE format(
+        'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
+        partition_name, table_name, start_date, end_date
+    );
+END;
+$$ LANGUAGE plpgsql;
+
+-- Create partitions for next 3 months
+SELECT create_monthly_partition('message_history', NOW() + INTERVAL '1 month');
+SELECT create_monthly_partition('message_history', NOW() + INTERVAL '2 months');
+SELECT create_monthly_partition('message_history', NOW() + INTERVAL '3 months');
+```
+
+## Connection Pooling
+
+### PgBouncer Configuration
+
+```ini
+; pgbouncer.ini
+[databases]
+gb_shard1 = host=shard1.db port=5432 dbname=gb
+gb_shard2 = host=shard2.db port=5432 dbname=gb
+
+[pgbouncer]
+listen_port = 6432
+listen_addr = *
+auth_type = md5
+auth_file = /etc/pgbouncer/userlist.txt
+
+; Pool settings
+pool_mode = transaction
+max_client_conn = 10000
+default_pool_size = 50
+min_pool_size = 10
+reserve_pool_size = 25
+reserve_pool_timeout = 3
+
+; Timeouts
+server_connect_timeout = 3
+server_idle_timeout = 600
+server_lifetime = 3600
+client_idle_timeout = 0
+```
+
+### Application Connection Settings
+
+```toml
+# config.toml
+[database]
+max_connections = 100
+min_connections = 10
+connection_timeout_secs = 5
+idle_timeout_secs = 300
+max_lifetime_secs = 1800
+```
+
+## Query Optimization
+
+### Use EXPLAIN ANALYZE
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
+SELECT * FROM message_history
+WHERE session_id = 'abc-123'
+ORDER BY message_index;
+```
+
+### Avoid N+1 Queries
+
+**Bad:**
+```sql
+-- 1 query for sessions
+SELECT * FROM user_sessions WHERE user_id = 'xyz';
+-- N queries for messages (one per session)
+SELECT * FROM message_history WHERE session_id = ?;
+```
+
+**Good:**
+```sql
+-- Single query with JOIN
+SELECT s.*, m.*
+FROM user_sessions s
+LEFT JOIN message_history m ON m.session_id = s.id
+WHERE s.user_id = 'xyz'
+ORDER BY s.created_at DESC, m.message_index;
+```
+
+### Use Covering Indexes
+
+```sql
+-- Index includes all needed columns (no table lookup)
+CREATE INDEX idx_sessions_covering ON user_sessions(user_id, created_at DESC)
+INCLUDE (title, message_count, last_activity_at);
+```
+
+## Vacuum and Maintenance
+
+### Aggressive Autovacuum for High-Churn Tables
+
+```sql
+ALTER TABLE message_history SET (
+    autovacuum_vacuum_scale_factor = 0.01,
+    autovacuum_analyze_scale_factor = 0.005,
+    autovacuum_vacuum_cost_delay = 2
+);
+
+ALTER TABLE user_sessions SET (
+    autovacuum_vacuum_scale_factor = 0.02,
+    autovacuum_analyze_scale_factor = 0.01
+);
+```
+
+### Regular Maintenance Tasks
+
+```sql
+-- Weekly: Reindex bloated indexes
+REINDEX INDEX CONCURRENTLY idx_messages_session;
+
+-- Monthly: Update statistics
+ANALYZE VERBOSE message_history;
+
+-- Quarterly: Cluster heavily-queried tables
+CLUSTER message_history USING idx_messages_session;
+```
+
+## Monitoring Queries
+
+### Table Bloat Check
+
+```sql
+SELECT
+    schemaname || '.' || tablename AS table,
+    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
+    pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
+    pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
+FROM pg_tables
+WHERE schemaname = 'public'
+ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
+LIMIT 20;
+```
+
+### Slow Query Log
+
+```sql
+-- postgresql.conf
+log_min_duration_statement = 100  -- Log queries > 100ms
+log_statement = 'none'
+log_lock_waits = on
+```
+
+### Index Usage Statistics
+
+```sql
+SELECT
+    schemaname || '.' || relname AS table,
+    indexrelname AS index,
+    idx_scan AS scans,
+    idx_tup_read AS tuples_read,
+    idx_tup_fetch AS tuples_fetched,
+    pg_size_pretty(pg_relation_size(indexrelid)) AS size
+FROM pg_stat_user_indexes
+ORDER BY idx_scan DESC
+LIMIT 20;
+```
+
+## Best Practices Summary
+
+1. **Use SMALLINT for enums** - 2 bytes vs 10-50 bytes per field
+2. **Partition time-series tables** - Monthly partitions for messages/analytics
+3. **Create partial indexes** - Only index active/relevant rows
+4. **Use connection pooling** - PgBouncer with transaction mode
+5. **Enable aggressive autovacuum** - For high-churn tables
+6. **Monitor query performance** - Log slow queries, check EXPLAIN plans
+7. **Use covering indexes** - Include frequently-accessed columns
+8. **Avoid cross-shard queries** - Keep tenant data together
+9. **Regular maintenance** - Reindex, analyze, cluster as needed
+10. **Test at scale** - Use production-like data volumes in staging
--- a/src/21-scale/sharding.md
+++ b/src/21-scale/sharding.md
@ -0,0 +1,309 @@
+# Sharding Architecture
+
+This document describes how General Bots distributes data across multiple database shards for horizontal scaling.
+
+## Overview
+
+Sharding enables General Bots to scale beyond single-database limits by distributing data across multiple database instances. Each shard contains a subset of tenants, and data never crosses shard boundaries during normal operations.
+
+## Shard Configuration
+
+### Shard Config Table
+
+The `shard_config` table defines all available shards:
+
+```sql
+CREATE TABLE shard_config (
+    shard_id SMALLINT PRIMARY KEY,
+    region_code CHAR(3) NOT NULL,        -- ISO 3166-1 alpha-3: USA, BRA, DEU
+    datacenter VARCHAR(32) NOT NULL,      -- e.g., 'us-east-1', 'eu-west-1'
+    connection_string TEXT NOT NULL,      -- Encrypted connection string
+    is_primary BOOLEAN DEFAULT false,
+    is_active BOOLEAN DEFAULT true,
+    min_tenant_id BIGINT NOT NULL,
+    max_tenant_id BIGINT NOT NULL,
+    created_at TIMESTAMPTZ DEFAULT NOW()
+);
+```
+
+### Example Configuration
+
+```sql
+-- Americas
+INSERT INTO shard_config VALUES 
+(1, 'USA', 'us-east-1', 'postgresql://shard1.db:5432/gb', true, true, 1, 1000000),
+(2, 'USA', 'us-west-2', 'postgresql://shard2.db:5432/gb', false, true, 1000001, 2000000),
+(3, 'BRA', 'sa-east-1', 'postgresql://shard3.db:5432/gb', false, true, 2000001, 3000000);
+
+-- Europe
+INSERT INTO shard_config VALUES 
+(4, 'DEU', 'eu-central-1', 'postgresql://shard4.db:5432/gb', false, true, 3000001, 4000000),
+(5, 'GBR', 'eu-west-2', 'postgresql://shard5.db:5432/gb', false, true, 4000001, 5000000);
+
+-- Asia Pacific
+INSERT INTO shard_config VALUES 
+(6, 'SGP', 'ap-southeast-1', 'postgresql://shard6.db:5432/gb', false, true, 5000001, 6000000),
+(7, 'JPN', 'ap-northeast-1', 'postgresql://shard7.db:5432/gb', false, true, 6000001, 7000000);
+```
+
+## Tenant-to-Shard Mapping
+
+### Mapping Table
+
+```sql
+CREATE TABLE tenant_shard_map (
+    tenant_id BIGINT PRIMARY KEY,
+    shard_id SMALLINT NOT NULL REFERENCES shard_config(shard_id),
+    region_code CHAR(3) NOT NULL,
+    created_at TIMESTAMPTZ DEFAULT NOW()
+);
+```
+
+### Routing Logic
+
+When a request comes in, the system:
+
+1. Extracts `tenant_id` from the request context
+2. Looks up `shard_id` from `tenant_shard_map`
+3. Routes the query to the appropriate database connection
+
+```rust
+// Rust routing example
+pub fn get_shard_connection(tenant_id: i64) -> Result<DbConnection> {
+    let shard_id = SHARD_MAP.get(&tenant_id)
+        .ok_or_else(|| Error::TenantNotFound(tenant_id))?;
+    
+    CONNECTION_POOLS.get(shard_id)
+        .ok_or_else(|| Error::ShardNotAvailable(*shard_id))
+}
+```
+
+## Data Model Requirements
+
+### Every Table Includes Shard Keys
+
+All tables must include `tenant_id` and `shard_id` columns:
+
+```sql
+CREATE TABLE user_sessions (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    tenant_id BIGINT NOT NULL,           -- Required for routing
+    shard_id SMALLINT NOT NULL,          -- Denormalized for queries
+    user_id UUID NOT NULL,
+    bot_id UUID NOT NULL,
+    -- ... other columns
+);
+```
+
+### Foreign Keys Within Shard Only
+
+Foreign keys only reference tables within the same shard:
+
+```sql
+-- Good: Same shard reference
+ALTER TABLE message_history 
+ADD CONSTRAINT fk_session 
+FOREIGN KEY (session_id) REFERENCES user_sessions(id);
+
+-- Bad: Cross-shard reference (never do this)
+-- FOREIGN KEY (other_tenant_data) REFERENCES other_shard.table(id)
+```
+
+## Snowflake ID Generation
+
+For globally unique, time-sortable IDs across shards:
+
+```sql
+CREATE OR REPLACE FUNCTION generate_snowflake_id(p_shard_id SMALLINT)
+RETURNS BIGINT AS $$
+DECLARE
+    epoch BIGINT := 1704067200000;  -- 2024-01-01 00:00:00 UTC
+    ts BIGINT;
+    seq BIGINT;
+BEGIN
+    -- 41 bits: timestamp (milliseconds since epoch)
+    ts := (EXTRACT(EPOCH FROM NOW()) * 1000)::BIGINT - epoch;
+    
+    -- 10 bits: shard_id (0-1023)
+    -- 12 bits: sequence (0-4095)
+    seq := nextval('global_seq') & 4095;
+    
+    RETURN (ts << 22) | ((p_shard_id & 1023) << 12) | seq;
+END;
+$$ LANGUAGE plpgsql;
+```
+
+### ID Structure
+
+```
+ 64-bit Snowflake ID
+┌─────────────────────────────────────────────────────────────────┐
+│  41 bits timestamp  │  10 bits shard  │  12 bits sequence      │
+│  (69 years range)   │  (1024 shards)  │  (4096/ms/shard)       │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Shard Operations
+
+### Creating a New Shard
+
+1. Provision new database instance
+2. Run migrations
+3. Add to `shard_config`
+4. Update routing configuration
+5. Begin assigning new tenants
+
+```bash
+# 1. Run migrations on new shard
+DATABASE_URL="postgresql://new-shard:5432/gb" diesel migration run
+
+# 2. Add shard config
+psql -c "INSERT INTO shard_config VALUES (8, 'AUS', 'ap-southeast-2', '...', false, true, 7000001, 8000000);"
+
+# 3. Reload routing
+curl -X POST http://localhost:3000/api/admin/reload-shard-config
+```
+
+### Tenant Migration Between Shards
+
+Moving a tenant to a different shard (e.g., for data locality):
+
+```sql
+-- 1. Set tenant to read-only mode
+UPDATE tenants SET settings = settings || '{"read_only": true}' WHERE id = 12345;
+
+-- 2. Export tenant data
+pg_dump -t 'user_sessions' -t 'message_history' --where="tenant_id=12345" source_db > tenant_12345.sql
+
+-- 3. Import to new shard
+psql target_db < tenant_12345.sql
+
+-- 4. Update routing
+UPDATE tenant_shard_map SET shard_id = 5, region_code = 'DEU' WHERE tenant_id = 12345;
+
+-- 5. Remove read-only mode
+UPDATE tenants SET settings = settings - 'read_only' WHERE id = 12345;
+
+-- 6. Clean up source shard (after verification)
+DELETE FROM user_sessions WHERE tenant_id = 12345;
+DELETE FROM message_history WHERE tenant_id = 12345;
+```
+
+## Query Patterns
+
+### Single-Tenant Queries (Most Common)
+
+```sql
+-- Efficient: Uses shard routing
+SELECT * FROM user_sessions 
+WHERE tenant_id = 12345 AND user_id = 'abc-123';
+```
+
+### Cross-Shard Queries (Admin Only)
+
+For global analytics, use a federation layer:
+
+```sql
+-- Using postgres_fdw for cross-shard reads
+SELECT shard_id, COUNT(*) as session_count
+FROM all_shards.user_sessions
+WHERE created_at > NOW() - INTERVAL '1 day'
+GROUP BY shard_id;
+```
+
+### Scatter-Gather Pattern
+
+For queries that must touch multiple shards:
+
+```rust
+async fn get_global_stats() -> Stats {
+    let futures: Vec<_> = SHARDS.iter()
+        .map(|shard| get_shard_stats(shard.id))
+        .collect();
+    
+    let results = futures::future::join_all(futures).await;
+    
+    results.into_iter().fold(Stats::default(), |acc, s| acc.merge(s))
+}
+```
+
+## High Availability
+
+### Per-Shard Replication
+
+Each shard should have:
+
+- 1 Primary (read/write)
+- 1-2 Replicas (read-only, failover)
+- Async replication with < 1s lag
+
+```
+Shard 1 Architecture:
+┌─────────────┐
+│   Primary   │◄──── Writes
+└──────┬──────┘
+       │ Streaming Replication
+   ┌───┴───┐
+   ▼       ▼
+┌──────┐ ┌──────┐
+│Rep 1 │ │Rep 2 │◄──── Reads
+└──────┘ └──────┘
+```
+
+### Failover Configuration
+
+```yaml
+# config.csv
+shard-1-primary,postgresql://shard1-primary:5432/gb
+shard-1-replica-1,postgresql://shard1-replica1:5432/gb
+shard-1-replica-2,postgresql://shard1-replica2:5432/gb
+shard-1-failover-priority,replica-1,replica-2
+```
+
+## Monitoring
+
+### Key Metrics Per Shard
+
+| Metric | Warning | Critical |
+|--------|---------|----------|
+| Connection pool usage | > 70% | > 90% |
+| Query latency p99 | > 100ms | > 500ms |
+| Replication lag | > 1s | > 10s |
+| Disk usage | > 70% | > 85% |
+| Tenant count | > 80% capacity | > 95% capacity |
+
+### Shard Health Check
+
+```sql
+-- Run on each shard
+SELECT 
+    current_setting('cluster_name') as shard,
+    pg_is_in_recovery() as is_replica,
+    pg_last_wal_receive_lsn() as wal_position,
+    pg_postmaster_start_time() as uptime_since,
+    (SELECT count(*) FROM pg_stat_activity) as connections,
+    (SELECT count(DISTINCT tenant_id) FROM tenants) as tenant_count;
+```
+
+## Best Practices
+
+1. **Shard by tenant, not by table** - Keep all tenant data together
+2. **Avoid cross-shard transactions** - Design for eventual consistency where needed
+3. **Pre-allocate tenant ranges** - Leave room for growth in each shard
+4. **Monitor shard hotspots** - Rebalance if one shard gets too busy
+5. **Test failover regularly** - Ensure replicas can be promoted
+6. **Use connection pooling** - PgBouncer or similar for each shard
+7. **Cache shard routing** - Don't query `tenant_shard_map` on every request
+
+## Migration from Single Database
+
+To migrate an existing single-database deployment to sharded:
+
+1. Add `shard_id` column to all tables (default to 1)
+2. Deploy shard routing code (disabled)
+3. Set up additional shard databases
+4. Enable routing for new tenants only
+5. Gradually migrate existing tenants during low-traffic windows
+6. Decommission original database when empty
+
+See [Regional Deployment](./regional-deployment.md) for multi-region considerations.
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -390,6 +390,12 @@
  - [Examples](./17-autonomous-tasks/examples.md)
  - [Designer](./17-autonomous-tasks/designer.md)

+# Part XVII - Scale
+
+- [Chapter 18: Scale](./21-scale/README.md)
+  - [Sharding Architecture](./21-scale/sharding.md)
+  - [Database Optimization](./21-scale/database-optimization.md)
+
 # Appendices

 - [Appendix A: Database Model](./15-appendix/README.md)