Add Scale chapter with sharding and database optimization docs

New Chapter 18 - Scale:
- README with overview of scaling architecture
- Sharding Architecture: tenant routing, shard config, data model, operations
- Database Optimization: SMALLINT enums, indexing, partitioning, connection pooling

Enum reference tables for all domain values (Channel, Message, Task, etc.)
Best practices for billion-user scale deployments
This commit is contained in:
Rodrigo Rodriguez (Pragmatismo) 2025-12-29 12:55:32 -03:00
parent c7bc1355c2
commit 6bfab51293
4 changed files with 830 additions and 0 deletions

79
src/21-scale/README.md Normal file
View file

@ -0,0 +1,79 @@
# Scale
This chapter covers horizontal scaling strategies for General Bots to support millions to billions of users across global deployments.
## Overview
General Bots is designed from the ground up to scale horizontally. The architecture supports:
- **Multi-tenancy**: Complete isolation between organizations
- **Regional sharding**: Data locality for compliance and performance
- **Database partitioning**: Efficient handling of high-volume tables
- **Stateless services**: Easy horizontal pod autoscaling
## Chapter Contents
- [Sharding Architecture](./sharding.md) - How data is distributed across shards
- [Database Optimization](./database-optimization.md) - Schema design for billion-scale
- [Regional Deployment](./regional-deployment.md) - Multi-region setup
- [Performance Tuning](./performance-tuning.md) - Optimization strategies
## Key Concepts
### Tenant Isolation
Every piece of data in General Bots is associated with a `tenant_id`. This enables:
1. Complete data isolation between organizations
2. Per-tenant resource limits and quotas
3. Tenant-specific configurations
4. Easy data export/deletion for compliance
### Shard Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Load Balancer │
└─────────────────────────┬───────────────────────────────────┘
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Region │ │ Region │ │ Region │
│ USA │ │ EUR │ │ APAC │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ Shard 4 │ │ Shard 5 │ │ Shard 6 │
└─────────┘ └─────────┘ └─────────┘
```
### Database Design Principles
1. **SMALLINT enums** instead of VARCHAR for domain values (2 bytes vs 20+ bytes)
2. **Partitioned tables** for high-volume data (messages, sessions, analytics)
3. **Composite primary keys** including `shard_id` for distributed queries
4. **Snowflake-like IDs** for globally unique, time-sortable identifiers
## When to Scale
| Users | Sessions/day | Messages/day | Recommended Setup |
|-------|--------------|--------------|-------------------|
| < 10K | < 100K | < 1M | Single node |
| 10K-100K | 100K-1M | 1M-10M | 2-3 nodes, single region |
| 100K-1M | 1M-10M | 10M-100M | Multi-node, consider sharding |
| 1M-10M | 10M-100M | 100M-1B | Regional shards |
| > 10M | > 100M | > 1B | Global shards with Citus/CockroachDB |
## Quick Start
To enable sharding in your deployment:
1. Configure shard mapping in `shard_config` table
2. Set `SHARD_ID` environment variable per instance
3. Deploy region-specific instances
4. Configure load balancer routing rules
See [Sharding Architecture](./sharding.md) for detailed setup instructions.

View file

@ -0,0 +1,436 @@
# Database Optimization
This document covers database schema design and optimization strategies for billion-user scale deployments.
## Schema Design Principles
### Use SMALLINT Enums Instead of VARCHAR
One of the most impactful optimizations is using integer enums instead of string-based status fields.
**Before (inefficient):**
```sql
CREATE TABLE auto_tasks (
id UUID PRIMARY KEY,
status VARCHAR(50) NOT NULL DEFAULT 'pending',
priority VARCHAR(20) NOT NULL DEFAULT 'normal',
execution_mode VARCHAR(50) NOT NULL DEFAULT 'supervised',
CONSTRAINT check_status CHECK (status IN ('pending', 'ready', 'running', 'paused', 'waiting_approval', 'completed', 'failed', 'cancelled'))
);
```
**After (optimized):**
```sql
CREATE TABLE auto_tasks (
id UUID PRIMARY KEY,
status SMALLINT NOT NULL DEFAULT 0, -- 2 bytes
priority SMALLINT NOT NULL DEFAULT 1, -- 2 bytes
execution_mode SMALLINT NOT NULL DEFAULT 1 -- 2 bytes
);
```
### Storage Comparison
| Field Type | Storage | Example Values |
|------------|---------|----------------|
| VARCHAR(50) | 1-51 bytes | 'waiting_approval' = 17 bytes |
| TEXT | 1+ bytes | 'completed' = 10 bytes |
| SMALLINT | 2 bytes | 4 = 2 bytes (always) |
| INTEGER | 4 bytes | 4 = 4 bytes (always) |
**Savings per row with 5 enum fields:**
- VARCHAR: ~50 bytes average
- SMALLINT: 10 bytes fixed
- **Savings: 40 bytes per row = 40GB per billion rows**
## Enum Value Reference
All domain values in General Bots use SMALLINT. Reference table:
### Channel Types
| Value | Name | Description |
|-------|------|-------------|
| 0 | web | Web chat interface |
| 1 | whatsapp | WhatsApp Business |
| 2 | telegram | Telegram Bot |
| 3 | msteams | Microsoft Teams |
| 4 | slack | Slack |
| 5 | email | Email channel |
| 6 | sms | SMS/Text messages |
| 7 | voice | Voice/Phone |
| 8 | instagram | Instagram DM |
| 9 | api | Direct API |
### Message Role
| Value | Name | Description |
|-------|------|-------------|
| 1 | user | User message |
| 2 | assistant | Bot response |
| 3 | system | System prompt |
| 4 | tool | Tool call/result |
| 9 | episodic | Episodic memory summary |
| 10 | compact | Compacted conversation |
### Message Type
| Value | Name | Description |
|-------|------|-------------|
| 0 | text | Plain text |
| 1 | image | Image attachment |
| 2 | audio | Audio file |
| 3 | video | Video file |
| 4 | document | Document/PDF |
| 5 | location | GPS location |
| 6 | contact | Contact card |
| 7 | sticker | Sticker |
| 8 | reaction | Message reaction |
### LLM Provider
| Value | Name | Description |
|-------|------|-------------|
| 0 | openai | OpenAI API |
| 1 | anthropic | Anthropic Claude |
| 2 | azure_openai | Azure OpenAI |
| 3 | azure_claude | Azure Claude |
| 4 | google | Google AI |
| 5 | local | Local llama.cpp |
| 6 | ollama | Ollama |
| 7 | groq | Groq |
| 8 | mistral | Mistral AI |
| 9 | cohere | Cohere |
### Task Status
| Value | Name | Description |
|-------|------|-------------|
| 0 | pending | Waiting to start |
| 1 | ready | Ready to execute |
| 2 | running | Currently executing |
| 3 | paused | Paused by user |
| 4 | waiting_approval | Needs approval |
| 5 | completed | Successfully finished |
| 6 | failed | Failed with error |
| 7 | cancelled | Cancelled by user |
### Task Priority
| Value | Name | Description |
|-------|------|-------------|
| 0 | low | Low priority |
| 1 | normal | Normal priority |
| 2 | high | High priority |
| 3 | urgent | Urgent |
| 4 | critical | Critical |
### Execution Mode
| Value | Name | Description |
|-------|------|-------------|
| 0 | manual | Manual execution only |
| 1 | supervised | Requires approval |
| 2 | autonomous | Fully automatic |
### Risk Level
| Value | Name | Description |
|-------|------|-------------|
| 0 | none | No risk |
| 1 | low | Low risk |
| 2 | medium | Medium risk |
| 3 | high | High risk |
| 4 | critical | Critical risk |
### Approval Status
| Value | Name | Description |
|-------|------|-------------|
| 0 | pending | Awaiting decision |
| 1 | approved | Approved |
| 2 | rejected | Rejected |
| 3 | expired | Timed out |
| 4 | skipped | Skipped |
### Intent Type
| Value | Name | Description |
|-------|------|-------------|
| 0 | unknown | Unclassified |
| 1 | app_create | Create application |
| 2 | todo | Create task/reminder |
| 3 | monitor | Set up monitoring |
| 4 | action | Execute action |
| 5 | schedule | Create schedule |
| 6 | goal | Set goal |
| 7 | tool | Create tool |
| 8 | query | Query/search |
### Memory Type
| Value | Name | Description |
|-------|------|-------------|
| 0 | short | Short-term |
| 1 | long | Long-term |
| 2 | episodic | Episodic |
| 3 | semantic | Semantic |
| 4 | procedural | Procedural |
### Sync Status
| Value | Name | Description |
|-------|------|-------------|
| 0 | synced | Fully synced |
| 1 | pending | Sync pending |
| 2 | conflict | Conflict detected |
| 3 | error | Sync error |
| 4 | deleted | Marked for deletion |
## Indexing Strategies
### Composite Indexes for Common Queries
```sql
-- Session lookup by user
CREATE INDEX idx_sessions_user ON user_sessions(user_id, created_at DESC);
-- Messages by session (most common query)
CREATE INDEX idx_messages_session ON message_history(session_id, message_index);
-- Active tasks by status and priority
CREATE INDEX idx_tasks_status ON auto_tasks(status, priority) WHERE status < 5;
-- Tenant-scoped queries
CREATE INDEX idx_sessions_tenant ON user_sessions(tenant_id, created_at DESC);
```
### Partial Indexes for Active Records
```sql
-- Only index active bots (saves space)
CREATE INDEX idx_bots_active ON bots(tenant_id, is_active) WHERE is_active = true;
-- Only index pending approvals
CREATE INDEX idx_approvals_pending ON task_approvals(task_id, expires_at) WHERE status = 0;
-- Only index unread messages
CREATE INDEX idx_messages_unread ON message_history(user_id, created_at) WHERE is_read = false;
```
### BRIN Indexes for Time-Series Data
```sql
-- BRIN index for time-ordered data (much smaller than B-tree)
CREATE INDEX idx_messages_created_brin ON message_history USING BRIN (created_at);
CREATE INDEX idx_analytics_date_brin ON analytics_events USING BRIN (created_at);
```
## Table Partitioning
### Partition High-Volume Tables by Time
```sql
-- Partitioned messages table
CREATE TABLE message_history (
id UUID NOT NULL,
session_id UUID NOT NULL,
tenant_id BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
-- other columns...
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE message_history_2025_01 PARTITION OF message_history
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE message_history_2025_02 PARTITION OF message_history
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- ... continue for each month
-- Default partition for future data
CREATE TABLE message_history_default PARTITION OF message_history DEFAULT;
```
### Automatic Partition Management
```sql
-- Function to create next month's partition
CREATE OR REPLACE FUNCTION create_monthly_partition(
table_name TEXT,
partition_date DATE
) RETURNS VOID AS $$
DECLARE
partition_name TEXT;
start_date DATE;
end_date DATE;
BEGIN
partition_name := table_name || '_' || to_char(partition_date, 'YYYY_MM');
start_date := date_trunc('month', partition_date);
end_date := start_date + INTERVAL '1 month';
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
partition_name, table_name, start_date, end_date
);
END;
$$ LANGUAGE plpgsql;
-- Create partitions for next 3 months
SELECT create_monthly_partition('message_history', NOW() + INTERVAL '1 month');
SELECT create_monthly_partition('message_history', NOW() + INTERVAL '2 months');
SELECT create_monthly_partition('message_history', NOW() + INTERVAL '3 months');
```
## Connection Pooling
### PgBouncer Configuration
```ini
; pgbouncer.ini
[databases]
gb_shard1 = host=shard1.db port=5432 dbname=gb
gb_shard2 = host=shard2.db port=5432 dbname=gb
[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
; Pool settings
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 50
min_pool_size = 10
reserve_pool_size = 25
reserve_pool_timeout = 3
; Timeouts
server_connect_timeout = 3
server_idle_timeout = 600
server_lifetime = 3600
client_idle_timeout = 0
```
### Application Connection Settings
```toml
# config.toml
[database]
max_connections = 100
min_connections = 10
connection_timeout_secs = 5
idle_timeout_secs = 300
max_lifetime_secs = 1800
```
## Query Optimization
### Use EXPLAIN ANALYZE
```sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM message_history
WHERE session_id = 'abc-123'
ORDER BY message_index;
```
### Avoid N+1 Queries
**Bad:**
```sql
-- 1 query for sessions
SELECT * FROM user_sessions WHERE user_id = 'xyz';
-- N queries for messages (one per session)
SELECT * FROM message_history WHERE session_id = ?;
```
**Good:**
```sql
-- Single query with JOIN
SELECT s.*, m.*
FROM user_sessions s
LEFT JOIN message_history m ON m.session_id = s.id
WHERE s.user_id = 'xyz'
ORDER BY s.created_at DESC, m.message_index;
```
### Use Covering Indexes
```sql
-- Index includes all needed columns (no table lookup)
CREATE INDEX idx_sessions_covering ON user_sessions(user_id, created_at DESC)
INCLUDE (title, message_count, last_activity_at);
```
## Vacuum and Maintenance
### Aggressive Autovacuum for High-Churn Tables
```sql
ALTER TABLE message_history SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 2
);
ALTER TABLE user_sessions SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_analyze_scale_factor = 0.01
);
```
### Regular Maintenance Tasks
```sql
-- Weekly: Reindex bloated indexes
REINDEX INDEX CONCURRENTLY idx_messages_session;
-- Monthly: Update statistics
ANALYZE VERBOSE message_history;
-- Quarterly: Cluster heavily-queried tables
CLUSTER message_history USING idx_messages_session;
```
## Monitoring Queries
### Table Bloat Check
```sql
SELECT
schemaname || '.' || tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;
```
### Slow Query Log
```sql
-- postgresql.conf
log_min_duration_statement = 100 -- Log queries > 100ms
log_statement = 'none'
log_lock_waits = on
```
### Index Usage Statistics
```sql
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
idx_scan AS scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC
LIMIT 20;
```
## Best Practices Summary
1. **Use SMALLINT for enums** - 2 bytes vs 10-50 bytes per field
2. **Partition time-series tables** - Monthly partitions for messages/analytics
3. **Create partial indexes** - Only index active/relevant rows
4. **Use connection pooling** - PgBouncer with transaction mode
5. **Enable aggressive autovacuum** - For high-churn tables
6. **Monitor query performance** - Log slow queries, check EXPLAIN plans
7. **Use covering indexes** - Include frequently-accessed columns
8. **Avoid cross-shard queries** - Keep tenant data together
9. **Regular maintenance** - Reindex, analyze, cluster as needed
10. **Test at scale** - Use production-like data volumes in staging

309
src/21-scale/sharding.md Normal file
View file

@ -0,0 +1,309 @@
# Sharding Architecture
This document describes how General Bots distributes data across multiple database shards for horizontal scaling.
## Overview
Sharding enables General Bots to scale beyond single-database limits by distributing data across multiple database instances. Each shard contains a subset of tenants, and data never crosses shard boundaries during normal operations.
## Shard Configuration
### Shard Config Table
The `shard_config` table defines all available shards:
```sql
CREATE TABLE shard_config (
shard_id SMALLINT PRIMARY KEY,
region_code CHAR(3) NOT NULL, -- ISO 3166-1 alpha-3: USA, BRA, DEU
datacenter VARCHAR(32) NOT NULL, -- e.g., 'us-east-1', 'eu-west-1'
connection_string TEXT NOT NULL, -- Encrypted connection string
is_primary BOOLEAN DEFAULT false,
is_active BOOLEAN DEFAULT true,
min_tenant_id BIGINT NOT NULL,
max_tenant_id BIGINT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
```
### Example Configuration
```sql
-- Americas
INSERT INTO shard_config VALUES
(1, 'USA', 'us-east-1', 'postgresql://shard1.db:5432/gb', true, true, 1, 1000000),
(2, 'USA', 'us-west-2', 'postgresql://shard2.db:5432/gb', false, true, 1000001, 2000000),
(3, 'BRA', 'sa-east-1', 'postgresql://shard3.db:5432/gb', false, true, 2000001, 3000000);
-- Europe
INSERT INTO shard_config VALUES
(4, 'DEU', 'eu-central-1', 'postgresql://shard4.db:5432/gb', false, true, 3000001, 4000000),
(5, 'GBR', 'eu-west-2', 'postgresql://shard5.db:5432/gb', false, true, 4000001, 5000000);
-- Asia Pacific
INSERT INTO shard_config VALUES
(6, 'SGP', 'ap-southeast-1', 'postgresql://shard6.db:5432/gb', false, true, 5000001, 6000000),
(7, 'JPN', 'ap-northeast-1', 'postgresql://shard7.db:5432/gb', false, true, 6000001, 7000000);
```
## Tenant-to-Shard Mapping
### Mapping Table
```sql
CREATE TABLE tenant_shard_map (
tenant_id BIGINT PRIMARY KEY,
shard_id SMALLINT NOT NULL REFERENCES shard_config(shard_id),
region_code CHAR(3) NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
```
### Routing Logic
When a request comes in, the system:
1. Extracts `tenant_id` from the request context
2. Looks up `shard_id` from `tenant_shard_map`
3. Routes the query to the appropriate database connection
```rust
// Rust routing example
pub fn get_shard_connection(tenant_id: i64) -> Result<DbConnection> {
let shard_id = SHARD_MAP.get(&tenant_id)
.ok_or_else(|| Error::TenantNotFound(tenant_id))?;
CONNECTION_POOLS.get(shard_id)
.ok_or_else(|| Error::ShardNotAvailable(*shard_id))
}
```
## Data Model Requirements
### Every Table Includes Shard Keys
All tables must include `tenant_id` and `shard_id` columns:
```sql
CREATE TABLE user_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id BIGINT NOT NULL, -- Required for routing
shard_id SMALLINT NOT NULL, -- Denormalized for queries
user_id UUID NOT NULL,
bot_id UUID NOT NULL,
-- ... other columns
);
```
### Foreign Keys Within Shard Only
Foreign keys only reference tables within the same shard:
```sql
-- Good: Same shard reference
ALTER TABLE message_history
ADD CONSTRAINT fk_session
FOREIGN KEY (session_id) REFERENCES user_sessions(id);
-- Bad: Cross-shard reference (never do this)
-- FOREIGN KEY (other_tenant_data) REFERENCES other_shard.table(id)
```
## Snowflake ID Generation
For globally unique, time-sortable IDs across shards:
```sql
CREATE OR REPLACE FUNCTION generate_snowflake_id(p_shard_id SMALLINT)
RETURNS BIGINT AS $$
DECLARE
epoch BIGINT := 1704067200000; -- 2024-01-01 00:00:00 UTC
ts BIGINT;
seq BIGINT;
BEGIN
-- 41 bits: timestamp (milliseconds since epoch)
ts := (EXTRACT(EPOCH FROM NOW()) * 1000)::BIGINT - epoch;
-- 10 bits: shard_id (0-1023)
-- 12 bits: sequence (0-4095)
seq := nextval('global_seq') & 4095;
RETURN (ts << 22) | ((p_shard_id & 1023) << 12) | seq;
END;
$$ LANGUAGE plpgsql;
```
### ID Structure
```
64-bit Snowflake ID
┌─────────────────────────────────────────────────────────────────┐
│ 41 bits timestamp │ 10 bits shard │ 12 bits sequence │
│ (69 years range) │ (1024 shards) │ (4096/ms/shard) │
└─────────────────────────────────────────────────────────────────┘
```
## Shard Operations
### Creating a New Shard
1. Provision new database instance
2. Run migrations
3. Add to `shard_config`
4. Update routing configuration
5. Begin assigning new tenants
```bash
# 1. Run migrations on new shard
DATABASE_URL="postgresql://new-shard:5432/gb" diesel migration run
# 2. Add shard config
psql -c "INSERT INTO shard_config VALUES (8, 'AUS', 'ap-southeast-2', '...', false, true, 7000001, 8000000);"
# 3. Reload routing
curl -X POST http://localhost:3000/api/admin/reload-shard-config
```
### Tenant Migration Between Shards
Moving a tenant to a different shard (e.g., for data locality):
```sql
-- 1. Set tenant to read-only mode
UPDATE tenants SET settings = settings || '{"read_only": true}' WHERE id = 12345;
-- 2. Export tenant data
pg_dump -t 'user_sessions' -t 'message_history' --where="tenant_id=12345" source_db > tenant_12345.sql
-- 3. Import to new shard
psql target_db < tenant_12345.sql
-- 4. Update routing
UPDATE tenant_shard_map SET shard_id = 5, region_code = 'DEU' WHERE tenant_id = 12345;
-- 5. Remove read-only mode
UPDATE tenants SET settings = settings - 'read_only' WHERE id = 12345;
-- 6. Clean up source shard (after verification)
DELETE FROM user_sessions WHERE tenant_id = 12345;
DELETE FROM message_history WHERE tenant_id = 12345;
```
## Query Patterns
### Single-Tenant Queries (Most Common)
```sql
-- Efficient: Uses shard routing
SELECT * FROM user_sessions
WHERE tenant_id = 12345 AND user_id = 'abc-123';
```
### Cross-Shard Queries (Admin Only)
For global analytics, use a federation layer:
```sql
-- Using postgres_fdw for cross-shard reads
SELECT shard_id, COUNT(*) as session_count
FROM all_shards.user_sessions
WHERE created_at > NOW() - INTERVAL '1 day'
GROUP BY shard_id;
```
### Scatter-Gather Pattern
For queries that must touch multiple shards:
```rust
async fn get_global_stats() -> Stats {
let futures: Vec<_> = SHARDS.iter()
.map(|shard| get_shard_stats(shard.id))
.collect();
let results = futures::future::join_all(futures).await;
results.into_iter().fold(Stats::default(), |acc, s| acc.merge(s))
}
```
## High Availability
### Per-Shard Replication
Each shard should have:
- 1 Primary (read/write)
- 1-2 Replicas (read-only, failover)
- Async replication with < 1s lag
```
Shard 1 Architecture:
┌─────────────┐
│ Primary │◄──── Writes
└──────┬──────┘
│ Streaming Replication
┌───┴───┐
▼ ▼
┌──────┐ ┌──────┐
│Rep 1 │ │Rep 2 │◄──── Reads
└──────┘ └──────┘
```
### Failover Configuration
```yaml
# config.csv
shard-1-primary,postgresql://shard1-primary:5432/gb
shard-1-replica-1,postgresql://shard1-replica1:5432/gb
shard-1-replica-2,postgresql://shard1-replica2:5432/gb
shard-1-failover-priority,replica-1,replica-2
```
## Monitoring
### Key Metrics Per Shard
| Metric | Warning | Critical |
|--------|---------|----------|
| Connection pool usage | > 70% | > 90% |
| Query latency p99 | > 100ms | > 500ms |
| Replication lag | > 1s | > 10s |
| Disk usage | > 70% | > 85% |
| Tenant count | > 80% capacity | > 95% capacity |
### Shard Health Check
```sql
-- Run on each shard
SELECT
current_setting('cluster_name') as shard,
pg_is_in_recovery() as is_replica,
pg_last_wal_receive_lsn() as wal_position,
pg_postmaster_start_time() as uptime_since,
(SELECT count(*) FROM pg_stat_activity) as connections,
(SELECT count(DISTINCT tenant_id) FROM tenants) as tenant_count;
```
## Best Practices
1. **Shard by tenant, not by table** - Keep all tenant data together
2. **Avoid cross-shard transactions** - Design for eventual consistency where needed
3. **Pre-allocate tenant ranges** - Leave room for growth in each shard
4. **Monitor shard hotspots** - Rebalance if one shard gets too busy
5. **Test failover regularly** - Ensure replicas can be promoted
6. **Use connection pooling** - PgBouncer or similar for each shard
7. **Cache shard routing** - Don't query `tenant_shard_map` on every request
## Migration from Single Database
To migrate an existing single-database deployment to sharded:
1. Add `shard_id` column to all tables (default to 1)
2. Deploy shard routing code (disabled)
3. Set up additional shard databases
4. Enable routing for new tenants only
5. Gradually migrate existing tenants during low-traffic windows
6. Decommission original database when empty
See [Regional Deployment](./regional-deployment.md) for multi-region considerations.

View file

@ -390,6 +390,12 @@
- [Examples](./17-autonomous-tasks/examples.md)
- [Designer](./17-autonomous-tasks/designer.md)
# Part XVII - Scale
- [Chapter 18: Scale](./21-scale/README.md)
- [Sharding Architecture](./21-scale/sharding.md)
- [Database Optimization](./21-scale/database-optimization.md)
# Appendices
- [Appendix A: Database Model](./15-appendix/README.md)