botbook/src/chapter-07-gbapp/scaling.md

# Scaling and Load Balancing

General Bots is designed to scale from a single instance to a distributed cluster using LXC containers. This chapter covers auto-scaling, load balancing, sharding strategies, and failover systems.

## Scaling Architecture

General Bots uses a **horizontal scaling** approach with LXC containers:

```
                    ┌─────────────────┐
                    │   Caddy Proxy   │
                    │  (Load Balancer)│
                    └────────┬────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
         ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│  LXC Container  │ │  LXC Container  │ │  LXC Container  │
│   botserver-1   │ │   botserver-2   │ │   botserver-3   │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
         │                   │                   │
         └───────────────────┼───────────────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
         ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│   PostgreSQL    │ │     Redis       │ │     Qdrant      │
│   (Primary)     │ │   (Cluster)     │ │   (Cluster)     │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```

## Auto-Scaling Configuration

### config.csv Parameters

Configure auto-scaling behavior in your bot's `config.csv`:

```csv
# Auto-scaling settings
scale-enabled,true
scale-min-instances,1
scale-max-instances,10
scale-cpu-threshold,70
scale-memory-threshold,80
scale-request-threshold,1000
scale-cooldown-seconds,300
scale-check-interval,30
```

| Parameter | Description | Default |
|-----------|-------------|---------|
| `scale-enabled` | Enable auto-scaling | `false` |
| `scale-min-instances` | Minimum container count | `1` |
| `scale-max-instances` | Maximum container count | `10` |
| `scale-cpu-threshold` | CPU % to trigger scale-up | `70` |
| `scale-memory-threshold` | Memory % to trigger scale-up | `80` |
| `scale-request-threshold` | Requests/min to trigger scale-up | `1000` |
| `scale-cooldown-seconds` | Wait time between scaling events | `300` |
| `scale-check-interval` | Seconds between metric checks | `30` |

### Scaling Rules

Define custom scaling rules:

```csv
# Scale up when average response time exceeds 2 seconds
scale-rule-response-time,2000
scale-rule-response-action,up

# Scale down when CPU drops below 30%
scale-rule-cpu-low,30
scale-rule-cpu-low-action,down

# Scale up on queue depth
scale-rule-queue-depth,100
scale-rule-queue-action,up
```

## LXC Container Management

### Creating Scaled Instances

```bash
# Create additional botserver containers
for i in {2..5}; do
  lxc launch images:debian/12 botserver-$i
  lxc config device add botserver-$i port-$((8080+i)) proxy \
    listen=tcp:0.0.0.0:$((8080+i)) connect=tcp:127.0.0.1:8080
done
```

### Container Resource Limits

Set resource limits per container:

```bash
# CPU limits (number of cores)
lxc config set botserver-1 limits.cpu 4

# Memory limits
lxc config set botserver-1 limits.memory 8GB

# Disk I/O priority (0-10)
lxc config set botserver-1 limits.disk.priority 5

# Network bandwidth (ingress/egress)
lxc config device set botserver-1 eth0 limits.ingress 100Mbit
lxc config device set botserver-1 eth0 limits.egress 100Mbit
```

### Auto-Scaling Script

Create `/opt/gbo/scripts/autoscale.sh`:

```bash
#!/bin/bash

# Configuration
MIN_INSTANCES=1
MAX_INSTANCES=10
CPU_THRESHOLD=70
SCALE_COOLDOWN=300
LAST_SCALE_FILE="/tmp/last_scale_time"

get_avg_cpu() {
    local total=0
    local count=0
    for container in $(lxc list -c n --format csv | grep "^botserver-"); do
        cpu=$(lxc exec $container -- cat /proc/loadavg | awk '{print $1}')
        total=$(echo "$total + $cpu" | bc)
        count=$((count + 1))
    done
    echo "scale=2; $total / $count * 100" | bc
}

get_instance_count() {
    lxc list -c n --format csv | grep -c "^botserver-"
}

can_scale() {
    if [ ! -f "$LAST_SCALE_FILE" ]; then
        return 0
    fi
    last_scale=$(cat "$LAST_SCALE_FILE")
    now=$(date +%s)
    diff=$((now - last_scale))
    [ $diff -gt $SCALE_COOLDOWN ]
}

scale_up() {
    current=$(get_instance_count)
    if [ $current -ge $MAX_INSTANCES ]; then
        echo "Already at max instances ($MAX_INSTANCES)"
        return 1
    fi
    
    new_id=$((current + 1))
    echo "Scaling up: creating botserver-$new_id"
    
    lxc launch images:debian/12 botserver-$new_id
    lxc config set botserver-$new_id limits.cpu 4
    lxc config set botserver-$new_id limits.memory 8GB
    
    # Copy configuration
    lxc file push /opt/gbo/conf/botserver.env botserver-$new_id/opt/gbo/conf/
    
    # Start botserver
    lxc exec botserver-$new_id -- /opt/gbo/bin/botserver &
    
    # Update load balancer
    update_load_balancer
    
    date +%s > "$LAST_SCALE_FILE"
    echo "Scale up complete"
}

scale_down() {
    current=$(get_instance_count)
    if [ $current -le $MIN_INSTANCES ]; then
        echo "Already at min instances ($MIN_INSTANCES)"
        return 1
    fi
    
    # Remove highest numbered instance
    target="botserver-$current"
    echo "Scaling down: removing $target"
    
    # Drain connections
    lxc exec $target -- /opt/gbo/bin/botserver drain
    sleep 30
    
    # Stop and delete
    lxc stop $target
    lxc delete $target
    
    # Update load balancer
    update_load_balancer
    
    date +%s > "$LAST_SCALE_FILE"
    echo "Scale down complete"
}

update_load_balancer() {
    # Generate upstream list
    upstreams=""
    for container in $(lxc list -c n --format csv | grep "^botserver-"); do
        ip=$(lxc list $container -c 4 --format csv | cut -d' ' -f1)
        upstreams="$upstreams\n        to $ip:8080"
    done
    
    # Update Caddy config
    cat > /opt/gbo/conf/caddy/upstream.conf << EOF
upstream botserver {
    $upstreams
    lb_policy round_robin
    health_uri /api/health
    health_interval 10s
}
EOF
    
    # Reload Caddy
    lxc exec proxy-1 -- caddy reload --config /etc/caddy/Caddyfile
}

# Main loop
while true; do
    avg_cpu=$(get_avg_cpu)
    echo "Average CPU: $avg_cpu%"
    
    if can_scale; then
        if (( $(echo "$avg_cpu > $CPU_THRESHOLD" | bc -l) )); then
            scale_up
        elif (( $(echo "$avg_cpu < 30" | bc -l) )); then
            scale_down
        fi
    fi
    
    sleep 30
done
```

## Load Balancing

### Caddy Configuration

Primary load balancer configuration (`/opt/gbo/conf/caddy/Caddyfile`):

```caddyfile
{
    admin off
    auto_https on
}

(common) {
    encode gzip zstd
    header {
        -Server
        X-Content-Type-Options "nosniff"
        X-Frame-Options "DENY"
        Referrer-Policy "strict-origin-when-cross-origin"
    }
}

bot.example.com {
    import common
    
    # Health check endpoint (no load balancing)
    handle /api/health {
        reverse_proxy localhost:8080
    }
    
    # WebSocket connections (sticky sessions)
    handle /ws* {
        reverse_proxy botserver-1:8080 botserver-2:8080 botserver-3:8080 {
            lb_policy cookie
            lb_try_duration 5s
            health_uri /api/health
            health_interval 10s
            health_timeout 5s
        }
    }
    
    # API requests (round robin)
    handle /api/* {
        reverse_proxy botserver-1:8080 botserver-2:8080 botserver-3:8080 {
            lb_policy round_robin
            lb_try_duration 5s
            health_uri /api/health
            health_interval 10s
            fail_duration 30s
        }
    }
    
    # Static files (any instance)
    handle {
        reverse_proxy botserver-1:8080 botserver-2:8080 botserver-3:8080 {
            lb_policy first
        }
    }
}
```

### Load Balancing Policies

| Policy | Description | Use Case |
|--------|-------------|----------|
| `round_robin` | Rotate through backends | General API requests |
| `first` | Use first available | Static content |
| `least_conn` | Fewest active connections | Long-running requests |
| `ip_hash` | Consistent by client IP | Session affinity |
| `cookie` | Sticky sessions via cookie | WebSocket, stateful |
| `random` | Random selection | Testing |

### Rate Limiting

Configure rate limits in `config.csv`:

```csv
# Rate limiting
rate-limit-enabled,true
rate-limit-requests,100
rate-limit-window,60
rate-limit-burst,20
rate-limit-by,ip

# Per-endpoint limits
rate-limit-api-chat,30
rate-limit-api-files,50
rate-limit-api-auth,10
```

Rate limiting in Caddy:

```caddyfile
bot.example.com {
    # Global rate limit
    rate_limit {
        zone global {
            key {remote_host}
            events 100
            window 1m
        }
    }
    
    # Stricter limit for auth endpoints
    handle /api/auth/* {
        rate_limit {
            zone auth {
                key {remote_host}
                events 10
                window 1m
            }
        }
        reverse_proxy botserver:8080
    }
}
```

## Sharding Strategies

### Database Sharding Options

#### Option 1: Tenant-Based Sharding

Each tenant gets their own database:

```
┌─────────────────┐
│   Router/Proxy  │
└────────┬────────┘
         │
    ┌────┴────┬──────────┐
    │         │          │
    ▼         ▼          ▼
┌───────┐ ┌───────┐ ┌───────┐
│Tenant1│ │Tenant2│ │Tenant3│
│  DB   │ │  DB   │ │  DB   │
└───────┘ └───────┘ └───────┘
```

Configuration:

```csv
# Tenant sharding
shard-strategy,tenant
shard-tenant-db-prefix,gb_tenant_
shard-auto-create,true
```

#### Option 2: Hash-Based Sharding

Distribute data by hash of primary key:

```
User ID: 12345
Hash: 12345 % 4 = 1
Shard: shard-1
```

Configuration:

```csv
# Hash sharding
shard-strategy,hash
shard-count,4
shard-key,user_id
shard-algorithm,modulo
```

#### Option 3: Range-Based Sharding

Partition by ID ranges:

```csv
# Range sharding
shard-strategy,range
shard-ranges,0-999999:shard1,1000000-1999999:shard2,2000000-:shard3
```

#### Option 4: Geographic Sharding

Route by user location:

```csv
# Geographic sharding
shard-strategy,geo
shard-geo-us,postgres-us.example.com
shard-geo-eu,postgres-eu.example.com
shard-geo-asia,postgres-asia.example.com
shard-default,postgres-us.example.com
```

### Vector Database Sharding (Qdrant)

Qdrant supports automatic sharding:

```csv
# Qdrant sharding
qdrant-shard-count,4
qdrant-replication-factor,2
qdrant-write-consistency,majority
```

Collection creation with sharding:

```rust
// In vectordb code
let collection_config = CreateCollection {
    collection_name: format!("kb_{}", bot_id),
    vectors_config: VectorsConfig::Single(VectorParams {
        size: 384,
        distance: Distance::Cosine,
    }),
    shard_number: Some(4),
    replication_factor: Some(2),
    write_consistency_factor: Some(1),
    ..Default::default()
};
```

### Redis Cluster

For high-availability caching:

```csv
# Redis cluster
cache-mode,cluster
cache-nodes,redis-1:6379,redis-2:6379,redis-3:6379
cache-replicas,1
```

## Failover Systems

### Health Checks

Configure health check endpoints:

```csv
# Health check configuration
health-enabled,true
health-endpoint,/api/health
health-interval,10
health-timeout,5
health-retries,3
```

Health check response:

```json
{
  "status": "healthy",
  "version": "6.1.0",
  "uptime": 86400,
  "checks": {
    "database": "ok",
    "cache": "ok",
    "vectordb": "ok",
    "llm": "ok"
  },
  "metrics": {
    "cpu": 45.2,
    "memory": 62.1,
    "connections": 150
  }
}
```

### Automatic Failover

#### Database Failover (PostgreSQL)

Using Patroni for PostgreSQL HA:

```yaml
# patroni.yml
scope: botserver-cluster
name: postgres-1

restapi:
  listen: 0.0.0.0:8008
  connect_address: postgres-1:8008

etcd:
  hosts: etcd-1:2379,etcd-2:2379,etcd-3:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        max_connections: 200
        shared_buffers: 2GB

postgresql:
  listen: 0.0.0.0:5432
  connect_address: postgres-1:5432
  data_dir: /var/lib/postgresql/data
  authentication:
    superuser:
      username: postgres
      password: ${POSTGRES_PASSWORD}
    replication:
      username: replicator
      password: ${REPLICATION_PASSWORD}
```

#### Cache Failover (Redis Sentinel)

```csv
# Redis Sentinel configuration
cache-mode,sentinel
cache-sentinel-master,mymaster
cache-sentinel-nodes,sentinel-1:26379,sentinel-2:26379,sentinel-3:26379
```

### Circuit Breaker

Prevent cascade failures:

```csv
# Circuit breaker settings
circuit-breaker-enabled,true
circuit-breaker-threshold,5
circuit-breaker-timeout,30
circuit-breaker-half-open-requests,3
```

States:
- **Closed**: Normal operation
- **Open**: Failing, reject requests immediately
- **Half-Open**: Testing if service recovered

### Graceful Degradation

Configure fallback behavior:

```csv
# Fallback configuration
fallback-llm-enabled,true
fallback-llm-provider,local
fallback-llm-model,DeepSeek-R1-Distill-Qwen-1.5B

fallback-cache-enabled,true
fallback-cache-mode,memory

fallback-vectordb-enabled,true
fallback-vectordb-mode,keyword-search
```

## Monitoring Scaling

### Metrics Collection

Key metrics to monitor:

```csv
# Scaling metrics
metrics-scaling-enabled,true
metrics-container-count,true
metrics-scaling-events,true
metrics-load-distribution,true
```

### Alerting Rules

Configure alerts for scaling issues:

```yaml
# alerting-rules.yml
groups:
  - name: scaling
    rules:
      - alert: HighCPUUsage
        expr: avg(cpu_usage) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          
      - alert: MaxInstancesReached
        expr: container_count >= max_instances
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Maximum instances reached, cannot scale up"
          
      - alert: ScalingFailed
        expr: scaling_errors > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Scaling operation failed"
```

## Best Practices

### Scaling

1. **Start small** - Begin with auto-scaling disabled, monitor patterns first
2. **Set appropriate thresholds** - Too low causes thrashing, too high causes poor performance
3. **Use cooldown periods** - Prevent rapid scale up/down cycles
4. **Test failover** - Regularly test your failover procedures
5. **Monitor costs** - More instances = higher infrastructure costs

### Load Balancing

1. **Use sticky sessions for WebSockets** - Required for real-time features
2. **Enable health checks** - Remove unhealthy instances automatically
3. **Configure timeouts** - Prevent hanging connections
4. **Use connection pooling** - Reduce connection overhead

### Sharding

1. **Choose the right strategy** - Tenant-based is simplest for SaaS
2. **Plan for rebalancing** - Have procedures to move data between shards
3. **Avoid cross-shard queries** - Design to minimize these
4. **Monitor shard balance** - Uneven distribution causes hotspots

## Next Steps

- [Container Deployment](./containers.md) - LXC container basics
- [Architecture Overview](./architecture.md) - System design
- [Monitoring Dashboard](../chapter-04-gbui/monitoring.md) - Observe your cluster
- Split from botserver. 2025-12-03 19:56:35 -03:00			`# Scaling and Load Balancing`

			`General Bots is designed to scale from a single instance to a distributed cluster using LXC containers. This chapter covers auto-scaling, load balancing, sharding strategies, and failover systems.`

			`## Scaling Architecture`

			`General Bots uses a horizontal scaling approach with LXC containers:`

			```
			`┌─────────────────┐`
			`│ Caddy Proxy │`
			`│ (Load Balancer)│`
			`└────────┬────────┘`
			`│`
			`┌───────────────────┼───────────────────┐`
			`│ │ │`
			`▼ ▼ ▼`
			`┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐`
			`│ LXC Container │ │ LXC Container │ │ LXC Container │`
			`│ botserver-1 │ │ botserver-2 │ │ botserver-3 │`
			`└────────┬────────┘ └────────┬────────┘ └────────┬────────┘`
			`│ │ │`
			`└───────────────────┼───────────────────┘`
			`│`
			`┌───────────────────┼───────────────────┐`
			`│ │ │`
			`▼ ▼ ▼`
			`┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐`
			`│ PostgreSQL │ │ Redis │ │ Qdrant │`
			`│ (Primary) │ │ (Cluster) │ │ (Cluster) │`
			`└─────────────────┘ └─────────────────┘ └─────────────────┘`
			```

			`## Auto-Scaling Configuration`

			`### config.csv Parameters`

			Configure auto-scaling behavior in your bot's `config.csv`:

			```csv
			`# Auto-scaling settings`
			`scale-enabled,true`
			`scale-min-instances,1`
			`scale-max-instances,10`
			`scale-cpu-threshold,70`
			`scale-memory-threshold,80`
			`scale-request-threshold,1000`
			`scale-cooldown-seconds,300`
			`scale-check-interval,30`
			```

			`\| Parameter \| Description \| Default \|`
			`\|-----------\|-------------\|---------\|`
			\| `scale-enabled` \| Enable auto-scaling \| `false` \|
			\| `scale-min-instances` \| Minimum container count \| `1` \|
			\| `scale-max-instances` \| Maximum container count \| `10` \|
			\| `scale-cpu-threshold` \| CPU % to trigger scale-up \| `70` \|
			\| `scale-memory-threshold` \| Memory % to trigger scale-up \| `80` \|
			\| `scale-request-threshold` \| Requests/min to trigger scale-up \| `1000` \|
			\| `scale-cooldown-seconds` \| Wait time between scaling events \| `300` \|
			\| `scale-check-interval` \| Seconds between metric checks \| `30` \|

			`### Scaling Rules`

			`Define custom scaling rules:`

			```csv
			`# Scale up when average response time exceeds 2 seconds`
			`scale-rule-response-time,2000`
			`scale-rule-response-action,up`

			`# Scale down when CPU drops below 30%`
			`scale-rule-cpu-low,30`
			`scale-rule-cpu-low-action,down`

			`# Scale up on queue depth`
			`scale-rule-queue-depth,100`
			`scale-rule-queue-action,up`
			```

			`## LXC Container Management`

			`### Creating Scaled Instances`

			```bash
			`# Create additional botserver containers`
			`for i in {2..5}; do`
			`lxc launch images:debian/12 botserver-$i`
			`lxc config device add botserver-$i port-$((8080+i)) proxy \`
			`listen=tcp:0.0.0.0:$((8080+i)) connect=tcp:127.0.0.1:8080`
			`done`
			```

			`### Container Resource Limits`

			`Set resource limits per container:`

			```bash
			`# CPU limits (number of cores)`
			`lxc config set botserver-1 limits.cpu 4`

			`# Memory limits`
			`lxc config set botserver-1 limits.memory 8GB`

			`# Disk I/O priority (0-10)`
			`lxc config set botserver-1 limits.disk.priority 5`

			`# Network bandwidth (ingress/egress)`
			`lxc config device set botserver-1 eth0 limits.ingress 100Mbit`
			`lxc config device set botserver-1 eth0 limits.egress 100Mbit`
			```

			`### Auto-Scaling Script`

			Create `/opt/gbo/scripts/autoscale.sh`:

			```bash
			`#!/bin/bash`

			`# Configuration`
			`MIN_INSTANCES=1`
			`MAX_INSTANCES=10`
			`CPU_THRESHOLD=70`
			`SCALE_COOLDOWN=300`
			`LAST_SCALE_FILE="/tmp/last_scale_time"`

			`get_avg_cpu() {`
			`local total=0`
			`local count=0`
			`for container in $(lxc list -c n --format csv \| grep "^botserver-"); do`
			`cpu=$(lxc exec $container -- cat /proc/loadavg \| awk '{print $1}')`
			`total=$(echo "$total + $cpu" \| bc)`
			`count=$((count + 1))`
			`done`
			`echo "scale=2; $total / $count * 100" \| bc`
			`}`

			`get_instance_count() {`
			`lxc list -c n --format csv \| grep -c "^botserver-"`
			`}`

			`can_scale() {`
			`if [ ! -f "$LAST_SCALE_FILE" ]; then`
			`return 0`
			`fi`
			`last_scale=$(cat "$LAST_SCALE_FILE")`
			`now=$(date +%s)`
			`diff=$((now - last_scale))`
			`[ $diff -gt $SCALE_COOLDOWN ]`
			`}`

			`scale_up() {`
			`current=$(get_instance_count)`
			`if [ $current -ge $MAX_INSTANCES ]; then`
			`echo "Already at max instances ($MAX_INSTANCES)"`
			`return 1`
			`fi`

			`new_id=$((current + 1))`
			`echo "Scaling up: creating botserver-$new_id"`

			`lxc launch images:debian/12 botserver-$new_id`
			`lxc config set botserver-$new_id limits.cpu 4`
			`lxc config set botserver-$new_id limits.memory 8GB`

			`# Copy configuration`
			`lxc file push /opt/gbo/conf/botserver.env botserver-$new_id/opt/gbo/conf/`

			`# Start botserver`
			`lxc exec botserver-$new_id -- /opt/gbo/bin/botserver &`

			`# Update load balancer`
			`update_load_balancer`

			`date +%s > "$LAST_SCALE_FILE"`
			`echo "Scale up complete"`
			`}`

			`scale_down() {`
			`current=$(get_instance_count)`
			`if [ $current -le $MIN_INSTANCES ]; then`
			`echo "Already at min instances ($MIN_INSTANCES)"`
			`return 1`
			`fi`

			`# Remove highest numbered instance`
			`target="botserver-$current"`
			`echo "Scaling down: removing $target"`

			`# Drain connections`
			`lxc exec $target -- /opt/gbo/bin/botserver drain`
			`sleep 30`

			`# Stop and delete`
			`lxc stop $target`
			`lxc delete $target`

			`# Update load balancer`
			`update_load_balancer`

			`date +%s > "$LAST_SCALE_FILE"`
			`echo "Scale down complete"`
			`}`

			`update_load_balancer() {`
			`# Generate upstream list`
			`upstreams=""`
			`for container in $(lxc list -c n --format csv \| grep "^botserver-"); do`
			`ip=$(lxc list $container -c 4 --format csv \| cut -d' ' -f1)`
			`upstreams="$upstreams\n to $ip:8080"`
			`done`

			`# Update Caddy config`
			`cat > /opt/gbo/conf/caddy/upstream.conf << EOF`
			`upstream botserver {`
			`$upstreams`
			`lb_policy round_robin`
			`health_uri /api/health`
			`health_interval 10s`
			`}`
			`EOF`

			`# Reload Caddy`
			`lxc exec proxy-1 -- caddy reload --config /etc/caddy/Caddyfile`
			`}`

			`# Main loop`
			`while true; do`
			`avg_cpu=$(get_avg_cpu)`
			`echo "Average CPU: $avg_cpu%"`

			`if can_scale; then`
			`if (( $(echo "$avg_cpu > $CPU_THRESHOLD" \| bc -l) )); then`
			`scale_up`
			`elif (( $(echo "$avg_cpu < 30" \| bc -l) )); then`
			`scale_down`
			`fi`
			`fi`

			`sleep 30`
			`done`
			```

			`## Load Balancing`

			`### Caddy Configuration`

			Primary load balancer configuration (`/opt/gbo/conf/caddy/Caddyfile`):

			```caddyfile
			`{`
			`admin off`
			`auto_https on`
			`}`

			`(common) {`
			`encode gzip zstd`
			`header {`
			`-Server`
			`X-Content-Type-Options "nosniff"`
			`X-Frame-Options "DENY"`
			`Referrer-Policy "strict-origin-when-cross-origin"`
			`}`
			`}`

			`bot.example.com {`
			`import common`

			`# Health check endpoint (no load balancing)`
			`handle /api/health {`
			`reverse_proxy localhost:8080`
			`}`

			`# WebSocket connections (sticky sessions)`
			`handle /ws* {`
			`reverse_proxy botserver-1:8080 botserver-2:8080 botserver-3:8080 {`
			`lb_policy cookie`
			`lb_try_duration 5s`
			`health_uri /api/health`
			`health_interval 10s`
			`health_timeout 5s`
			`}`
			`}`

			`# API requests (round robin)`
			`handle /api/* {`
			`reverse_proxy botserver-1:8080 botserver-2:8080 botserver-3:8080 {`
			`lb_policy round_robin`
			`lb_try_duration 5s`
			`health_uri /api/health`
			`health_interval 10s`
			`fail_duration 30s`
			`}`
			`}`

			`# Static files (any instance)`
			`handle {`
			`reverse_proxy botserver-1:8080 botserver-2:8080 botserver-3:8080 {`
			`lb_policy first`
			`}`
			`}`
			`}`
			```

			`### Load Balancing Policies`

			`\| Policy \| Description \| Use Case \|`
			`\|--------\|-------------\|----------\|`
			\| `round_robin` \| Rotate through backends \| General API requests \|
			\| `first` \| Use first available \| Static content \|
			\| `least_conn` \| Fewest active connections \| Long-running requests \|
			\| `ip_hash` \| Consistent by client IP \| Session affinity \|
			\| `cookie` \| Sticky sessions via cookie \| WebSocket, stateful \|
			\| `random` \| Random selection \| Testing \|

			`### Rate Limiting`

			Configure rate limits in `config.csv`:

			```csv
			`# Rate limiting`
			`rate-limit-enabled,true`
			`rate-limit-requests,100`
			`rate-limit-window,60`
			`rate-limit-burst,20`
			`rate-limit-by,ip`

			`# Per-endpoint limits`
			`rate-limit-api-chat,30`
			`rate-limit-api-files,50`
			`rate-limit-api-auth,10`
			```

			`Rate limiting in Caddy:`

			```caddyfile
			`bot.example.com {`
			`# Global rate limit`
			`rate_limit {`
			`zone global {`
			`key {remote_host}`
			`events 100`
			`window 1m`
			`}`
			`}`

			`# Stricter limit for auth endpoints`
			`handle /api/auth/* {`
			`rate_limit {`
			`zone auth {`
			`key {remote_host}`
			`events 10`
			`window 1m`
			`}`
			`}`
			`reverse_proxy botserver:8080`
			`}`
			`}`
			```

			`## Sharding Strategies`

			`### Database Sharding Options`

			`#### Option 1: Tenant-Based Sharding`

			`Each tenant gets their own database:`

			```
			`┌─────────────────┐`
			`│ Router/Proxy │`
			`└────────┬────────┘`
			`│`
			`┌────┴────┬──────────┐`
			`│ │ │`
			`▼ ▼ ▼`
			`┌───────┐ ┌───────┐ ┌───────┐`
			`│Tenant1│ │Tenant2│ │Tenant3│`
			`│ DB │ │ DB │ │ DB │`
			`└───────┘ └───────┘ └───────┘`
			```

			`Configuration:`

			```csv
			`# Tenant sharding`
			`shard-strategy,tenant`
			`shard-tenant-db-prefix,gb_tenant_`
			`shard-auto-create,true`
			```

			`#### Option 2: Hash-Based Sharding`

			`Distribute data by hash of primary key:`

			```
			`User ID: 12345`
			`Hash: 12345 % 4 = 1`
			`Shard: shard-1`
			```

			`Configuration:`

			```csv
			`# Hash sharding`
			`shard-strategy,hash`
			`shard-count,4`
			`shard-key,user_id`
			`shard-algorithm,modulo`
			```

			`#### Option 3: Range-Based Sharding`

			`Partition by ID ranges:`

			```csv
			`# Range sharding`
			`shard-strategy,range`
			`shard-ranges,0-999999:shard1,1000000-1999999:shard2,2000000-:shard3`
			```

			`#### Option 4: Geographic Sharding`

			`Route by user location:`

			```csv
			`# Geographic sharding`
			`shard-strategy,geo`
			`shard-geo-us,postgres-us.example.com`
			`shard-geo-eu,postgres-eu.example.com`
			`shard-geo-asia,postgres-asia.example.com`
			`shard-default,postgres-us.example.com`
			```

			`### Vector Database Sharding (Qdrant)`

			`Qdrant supports automatic sharding:`

			```csv
			`# Qdrant sharding`
			`qdrant-shard-count,4`
			`qdrant-replication-factor,2`
			`qdrant-write-consistency,majority`
			```

			`Collection creation with sharding:`

			```rust
			`// In vectordb code`
			`let collection_config = CreateCollection {`
			`collection_name: format!("kb_{}", bot_id),`
			`vectors_config: VectorsConfig::Single(VectorParams {`
			`size: 384,`
			`distance: Distance::Cosine,`
			`}),`
			`shard_number: Some(4),`
			`replication_factor: Some(2),`
			`write_consistency_factor: Some(1),`
			`..Default::default()`
			`};`
			```

			`### Redis Cluster`

			`For high-availability caching:`

			```csv
			`# Redis cluster`
			`cache-mode,cluster`
			`cache-nodes,redis-1:6379,redis-2:6379,redis-3:6379`
			`cache-replicas,1`
			```

			`## Failover Systems`

			`### Health Checks`

			`Configure health check endpoints:`

			```csv
			`# Health check configuration`
			`health-enabled,true`
			`health-endpoint,/api/health`
			`health-interval,10`
			`health-timeout,5`
			`health-retries,3`
			```

			`Health check response:`

			```json
			`{`
			`"status": "healthy",`
			`"version": "6.1.0",`
			`"uptime": 86400,`
			`"checks": {`
			`"database": "ok",`
			`"cache": "ok",`
			`"vectordb": "ok",`
			`"llm": "ok"`
			`},`
			`"metrics": {`
			`"cpu": 45.2,`
			`"memory": 62.1,`
			`"connections": 150`
			`}`
			`}`
			```

			`### Automatic Failover`

			`#### Database Failover (PostgreSQL)`

			`Using Patroni for PostgreSQL HA:`

			```yaml
			`# patroni.yml`
			`scope: botserver-cluster`
			`name: postgres-1`

			`restapi:`
			`listen: 0.0.0.0:8008`
			`connect_address: postgres-1:8008`

			`etcd:`
			`hosts: etcd-1:2379,etcd-2:2379,etcd-3:2379`

			`bootstrap:`
			`dcs:`
			`ttl: 30`
			`loop_wait: 10`
			`retry_timeout: 10`
			`maximum_lag_on_failover: 1048576`
			`postgresql:`
			`use_pg_rewind: true`
			`parameters:`
			`max_connections: 200`
			`shared_buffers: 2GB`

			`postgresql:`
			`listen: 0.0.0.0:5432`
			`connect_address: postgres-1:5432`
			`data_dir: /var/lib/postgresql/data`
			`authentication:`
			`superuser:`
			`username: postgres`
			`password: ${POSTGRES_PASSWORD}`
			`replication:`
			`username: replicator`
			`password: ${REPLICATION_PASSWORD}`
			```

			`#### Cache Failover (Redis Sentinel)`

			```csv
			`# Redis Sentinel configuration`
			`cache-mode,sentinel`
			`cache-sentinel-master,mymaster`
			`cache-sentinel-nodes,sentinel-1:26379,sentinel-2:26379,sentinel-3:26379`
			```

			`### Circuit Breaker`

			`Prevent cascade failures:`

			```csv
			`# Circuit breaker settings`
			`circuit-breaker-enabled,true`
			`circuit-breaker-threshold,5`
			`circuit-breaker-timeout,30`
			`circuit-breaker-half-open-requests,3`
			```

			`States:`
			`- Closed: Normal operation`
			`- Open: Failing, reject requests immediately`
			`- Half-Open: Testing if service recovered`

			`### Graceful Degradation`

			`Configure fallback behavior:`

			```csv
			`# Fallback configuration`
			`fallback-llm-enabled,true`
			`fallback-llm-provider,local`
			`fallback-llm-model,DeepSeek-R1-Distill-Qwen-1.5B`

			`fallback-cache-enabled,true`
			`fallback-cache-mode,memory`

			`fallback-vectordb-enabled,true`
			`fallback-vectordb-mode,keyword-search`
			```

			`## Monitoring Scaling`

			`### Metrics Collection`

			`Key metrics to monitor:`

			```csv
			`# Scaling metrics`
			`metrics-scaling-enabled,true`
			`metrics-container-count,true`
			`metrics-scaling-events,true`
			`metrics-load-distribution,true`
			```

			`### Alerting Rules`

			`Configure alerts for scaling issues:`

			```yaml
			`# alerting-rules.yml`
			`groups:`
			`- name: scaling`
			`rules:`
			`- alert: HighCPUUsage`
			`expr: avg(cpu_usage) > 80`
			`for: 5m`
			`labels:`
			`severity: warning`
			`annotations:`
			`summary: "High CPU usage detected"`

			`- alert: MaxInstancesReached`
			`expr: container_count >= max_instances`
			`for: 1m`
			`labels:`
			`severity: critical`
			`annotations:`
			`summary: "Maximum instances reached, cannot scale up"`

			`- alert: ScalingFailed`
			`expr: scaling_errors > 0`
			`for: 1m`
			`labels:`
			`severity: critical`
			`annotations:`
			`summary: "Scaling operation failed"`
			```

			`## Best Practices`

			`### Scaling`

			`1. Start small - Begin with auto-scaling disabled, monitor patterns first`
			`2. Set appropriate thresholds - Too low causes thrashing, too high causes poor performance`
			`3. Use cooldown periods - Prevent rapid scale up/down cycles`
			`4. Test failover - Regularly test your failover procedures`
			`5. Monitor costs - More instances = higher infrastructure costs`

			`### Load Balancing`

			`1. Use sticky sessions for WebSockets - Required for real-time features`
			`2. Enable health checks - Remove unhealthy instances automatically`
			`3. Configure timeouts - Prevent hanging connections`
			`4. Use connection pooling - Reduce connection overhead`

			`### Sharding`

			`1. Choose the right strategy - Tenant-based is simplest for SaaS`
			`2. Plan for rebalancing - Have procedures to move data between shards`
			`3. Avoid cross-shard queries - Design to minimize these`
			`4. Monitor shard balance - Uneven distribution causes hotspots`

			`## Next Steps`

			`- [Container Deployment](./containers.md) - LXC container basics`
			`- [Architecture Overview](./architecture.md) - System design`
			`- [Monitoring Dashboard](../chapter-04-gbui/monitoring.md) - Observe your cluster`