# NVIDIA GPU Setup for LXC Containers

This guide covers setting up NVIDIA GPU passthrough for botserver running in LXC containers, enabling hardware acceleration for local LLM inference.

## Prerequisites

- NVIDIA GPU (RTX 3060 or better with 12GB+ VRAM recommended)
- NVIDIA drivers installed on the host system
- LXD/LXC installed
- CUDA-capable GPU

## LXD Configuration (Interactive Setup)

When initializing LXD, use these settings:

```bash
sudo lxd init
```

Answer the prompts as follows:
- **Would you like to use LXD clustering?** → `no`
- **Do you want to configure a new storage pool?** → `no` (will create `/generalbots` later)
- **Would you like to connect to a MAAS server?** → `no`
- **Would you like to create a new local network bridge?** → `yes`
- **What should the new bridge be called?** → `lxdbr0`
- **What IPv4 address should be used?** → `auto`
- **What IPv6 address should be used?** → `auto`
- **Would you like the LXD server to be available over the network?** → `no`
- **Would you like stale cached images to be updated automatically?** → `no`
- **Would you like a YAML "lxd init" preseed to be printed?** → `no`

### Storage Configuration
- **Storage backend name:** → `default`
- **Storage backend driver:** → `zfs`
- **Create a new ZFS pool?** → `yes`

## NVIDIA GPU Configuration

### On the Host System

Create a GPU profile and attach it to your container:

```bash
# Create GPU profile
lxc profile create gpu

# Add GPU device to profile
lxc profile device add gpu gpu gpu gputype=physical

# Apply GPU profile to your container
lxc profile add gb-system gpu
```

### Inside the Container

Configure NVIDIA driver version pinning and install drivers:

1. **Pin NVIDIA driver versions** to ensure stability:

```bash
cat > /etc/apt/preferences.d/nvidia-drivers << 'EOF'
Package: *nvidia*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: cuda-drivers*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libcuda*
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libxnvctrl* 
Pin: version 560.35.05-1
Pin-Priority: 1001

Package: libnv*
Pin: version 560.35.05-1
Pin-Priority: 1001
EOF
```

2. **Install NVIDIA drivers and CUDA toolkit:**

```bash
# Update package lists
apt update

# Install NVIDIA driver and nvidia-smi
apt install -y nvidia-driver nvidia-smi

# Add CUDA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb

# Install CUDA toolkit
apt-get update
apt-get -y install cuda-toolkit-12-8
apt-get install -y cuda-drivers
```

## Verify GPU Access

After installation, verify GPU is accessible:

```bash
# Check GPU is visible
nvidia-smi

# Should show your GPU with driver version 560.35.05
```

## Configure botserver for GPU

Update your bot's `config.csv` to use GPU acceleration:

```csv
name,value
llm-server-gpu-layers,35
```

The number of layers depends on your GPU memory:
- **RTX 3060 (12GB):** 20-35 layers
- **RTX 3070 (8GB):** 15-25 layers
- **RTX 4070 (12GB):** 30-40 layers
- **RTX 4090 (24GB):** 50-99 layers

## Troubleshooting

### GPU Not Detected

If `nvidia-smi` doesn't show the GPU:

1. Check host GPU drivers:
   ```bash
   # On host
   nvidia-smi
   lxc config device list gb-system
   ```

2. Verify GPU passthrough:
   ```bash
   # Inside container
   ls -la /dev/nvidia*
   ```

3. Check kernel modules:
   ```bash
   lsmod | grep nvidia
   ```

### Driver Version Mismatch

If you encounter driver version conflicts:

1. Ensure host and container use the same driver version
2. Remove the version pinning file and install matching drivers:
   ```bash
   rm /etc/apt/preferences.d/nvidia-drivers
   apt update
   apt install nvidia-driver-560
   ```

### CUDA Library Issues

If CUDA libraries aren't found:

```bash
# Add CUDA to library path
echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf
ldconfig

# Add to PATH
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
```

## Custom llama.cpp Compilation

If you need custom CPU/GPU optimizations or specific hardware support, compile llama.cpp from source:

### Prerequisites

```bash
sudo apt update
sudo apt install build-essential cmake git
```

### Compilation Steps

```bash
# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Create build directory
mkdir build
cd build

# Configure with CUDA support
cmake .. -DLLAMA_CUDA=ON -DLLAMA_CURL=OFF

# Compile using all available cores
make -j$(nproc)
```

### Compilation Options

For different hardware configurations:

```bash
# CPU-only build (no GPU)
cmake .. -DLLAMA_CURL=OFF

# CUDA with specific compute capability
cmake .. -DLLAMA_CUDA=ON -DLLAMA_CUDA_FORCE_COMPUTE=75

# ROCm for AMD GPUs
cmake .. -DLLAMA_HIPBLAS=ON

# Metal for Apple Silicon
cmake .. -DLLAMA_METAL=ON

# AVX2 optimizations for modern CPUs
cmake .. -DLLAMA_AVX2=ON

# F16C for half-precision support
cmake .. -DLLAMA_F16C=ON
```

### After Compilation

```bash
# Copy compiled binary to botserver
cp bin/llama-server /path/to/botserver-stack/bin/llm/

# Update config.csv to use custom build
llm-server-path,/path/to/botserver-stack/bin/llm/
```

### Benefits of Custom Compilation

- **Hardware-specific optimizations** for your exact CPU/GPU
- **Custom CUDA compute capabilities** for newer GPUs
- **AVX/AVX2/AVX512** instructions for faster CPU inference
- **Reduced binary size** by excluding unused features
- **Support for experimental features** not in releases

## Performance Optimization

### Memory Settings

For optimal LLM performance with GPU:

```csv
name,value
llm-server-gpu-layers,35
llm-server-mlock,true
llm-server-no-mmap,false
llm-server-ctx-size,4096
```

### Multiple GPUs

For systems with multiple GPUs, specify which GPU to use:

```bash
# List available GPUs
lxc profile device add gpu gpu0 gpu gputype=physical id=0
lxc profile device add gpu gpu1 gpu gputype=physical id=1
```

## Benefits of GPU Acceleration

With GPU acceleration enabled:
- **5-10x faster** inference compared to CPU
- **Higher context sizes** possible (8K-32K tokens)
- **Real-time responses** even with large models
- **Lower CPU usage** for other tasks
- **Support for larger models** (13B, 30B parameters)

## Next Steps

- [Installation Guide](./installation.md) - Complete botserver setup
- [Quick Start](./quick-start.md) - Create your first bot
- [Configuration Reference](../chapter-02/gbot.md) - All GPU-related parameters