2025-12-03 19:56:35 -03:00
|
|
|
# NVIDIA GPU Setup for LXC Containers
|
|
|
|
|
|
2025-12-12 23:18:36 -03:00
|
|
|
This guide covers setting up NVIDIA GPU passthrough for botserver running in LXC containers, enabling hardware acceleration for local LLM inference.
|
2025-12-03 19:56:35 -03:00
|
|
|
|
|
|
|
|
## Prerequisites
|
|
|
|
|
|
|
|
|
|
- NVIDIA GPU (RTX 3060 or better with 12GB+ VRAM recommended)
|
|
|
|
|
- NVIDIA drivers installed on the host system
|
|
|
|
|
- LXD/LXC installed
|
|
|
|
|
- CUDA-capable GPU
|
|
|
|
|
|
|
|
|
|
## LXD Configuration (Interactive Setup)
|
|
|
|
|
|
|
|
|
|
When initializing LXD, use these settings:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
sudo lxd init
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Answer the prompts as follows:
|
|
|
|
|
- **Would you like to use LXD clustering?** → `no`
|
|
|
|
|
- **Do you want to configure a new storage pool?** → `no` (will create `/generalbots` later)
|
|
|
|
|
- **Would you like to connect to a MAAS server?** → `no`
|
|
|
|
|
- **Would you like to create a new local network bridge?** → `yes`
|
|
|
|
|
- **What should the new bridge be called?** → `lxdbr0`
|
|
|
|
|
- **What IPv4 address should be used?** → `auto`
|
|
|
|
|
- **What IPv6 address should be used?** → `auto`
|
|
|
|
|
- **Would you like the LXD server to be available over the network?** → `no`
|
|
|
|
|
- **Would you like stale cached images to be updated automatically?** → `no`
|
|
|
|
|
- **Would you like a YAML "lxd init" preseed to be printed?** → `no`
|
|
|
|
|
|
|
|
|
|
### Storage Configuration
|
|
|
|
|
- **Storage backend name:** → `default`
|
|
|
|
|
- **Storage backend driver:** → `zfs`
|
|
|
|
|
- **Create a new ZFS pool?** → `yes`
|
|
|
|
|
|
|
|
|
|
## NVIDIA GPU Configuration
|
|
|
|
|
|
|
|
|
|
### On the Host System
|
|
|
|
|
|
|
|
|
|
Create a GPU profile and attach it to your container:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Create GPU profile
|
|
|
|
|
lxc profile create gpu
|
|
|
|
|
|
|
|
|
|
# Add GPU device to profile
|
|
|
|
|
lxc profile device add gpu gpu gpu gputype=physical
|
|
|
|
|
|
|
|
|
|
# Apply GPU profile to your container
|
|
|
|
|
lxc profile add gb-system gpu
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Inside the Container
|
|
|
|
|
|
|
|
|
|
Configure NVIDIA driver version pinning and install drivers:
|
|
|
|
|
|
|
|
|
|
1. **Pin NVIDIA driver versions** to ensure stability:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
cat > /etc/apt/preferences.d/nvidia-drivers << 'EOF'
|
|
|
|
|
Package: *nvidia*
|
|
|
|
|
Pin: version 560.35.05-1
|
|
|
|
|
Pin-Priority: 1001
|
|
|
|
|
|
|
|
|
|
Package: cuda-drivers*
|
|
|
|
|
Pin: version 560.35.05-1
|
|
|
|
|
Pin-Priority: 1001
|
|
|
|
|
|
|
|
|
|
Package: libcuda*
|
|
|
|
|
Pin: version 560.35.05-1
|
|
|
|
|
Pin-Priority: 1001
|
|
|
|
|
|
|
|
|
|
Package: libxnvctrl*
|
|
|
|
|
Pin: version 560.35.05-1
|
|
|
|
|
Pin-Priority: 1001
|
|
|
|
|
|
|
|
|
|
Package: libnv*
|
|
|
|
|
Pin: version 560.35.05-1
|
|
|
|
|
Pin-Priority: 1001
|
|
|
|
|
EOF
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
2. **Install NVIDIA drivers and CUDA toolkit:**
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Update package lists
|
|
|
|
|
apt update
|
|
|
|
|
|
|
|
|
|
# Install NVIDIA driver and nvidia-smi
|
|
|
|
|
apt install -y nvidia-driver nvidia-smi
|
|
|
|
|
|
|
|
|
|
# Add CUDA repository
|
|
|
|
|
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
|
|
|
|
|
dpkg -i cuda-keyring_1.1-1_all.deb
|
|
|
|
|
|
|
|
|
|
# Install CUDA toolkit
|
|
|
|
|
apt-get update
|
|
|
|
|
apt-get -y install cuda-toolkit-12-8
|
|
|
|
|
apt-get install -y cuda-drivers
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Verify GPU Access
|
|
|
|
|
|
|
|
|
|
After installation, verify GPU is accessible:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check GPU is visible
|
|
|
|
|
nvidia-smi
|
|
|
|
|
|
|
|
|
|
# Should show your GPU with driver version 560.35.05
|
|
|
|
|
```
|
|
|
|
|
|
2025-12-12 23:18:36 -03:00
|
|
|
## Configure botserver for GPU
|
2025-12-03 19:56:35 -03:00
|
|
|
|
|
|
|
|
Update your bot's `config.csv` to use GPU acceleration:
|
|
|
|
|
|
|
|
|
|
```csv
|
|
|
|
|
name,value
|
|
|
|
|
llm-server-gpu-layers,35
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The number of layers depends on your GPU memory:
|
|
|
|
|
- **RTX 3060 (12GB):** 20-35 layers
|
|
|
|
|
- **RTX 3070 (8GB):** 15-25 layers
|
|
|
|
|
- **RTX 4070 (12GB):** 30-40 layers
|
|
|
|
|
- **RTX 4090 (24GB):** 50-99 layers
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
### GPU Not Detected
|
|
|
|
|
|
|
|
|
|
If `nvidia-smi` doesn't show the GPU:
|
|
|
|
|
|
|
|
|
|
1. Check host GPU drivers:
|
|
|
|
|
```bash
|
|
|
|
|
# On host
|
|
|
|
|
nvidia-smi
|
|
|
|
|
lxc config device list gb-system
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
2. Verify GPU passthrough:
|
|
|
|
|
```bash
|
|
|
|
|
# Inside container
|
|
|
|
|
ls -la /dev/nvidia*
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
3. Check kernel modules:
|
|
|
|
|
```bash
|
|
|
|
|
lsmod | grep nvidia
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Driver Version Mismatch
|
|
|
|
|
|
|
|
|
|
If you encounter driver version conflicts:
|
|
|
|
|
|
|
|
|
|
1. Ensure host and container use the same driver version
|
|
|
|
|
2. Remove the version pinning file and install matching drivers:
|
|
|
|
|
```bash
|
|
|
|
|
rm /etc/apt/preferences.d/nvidia-drivers
|
|
|
|
|
apt update
|
|
|
|
|
apt install nvidia-driver-560
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### CUDA Library Issues
|
|
|
|
|
|
|
|
|
|
If CUDA libraries aren't found:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Add CUDA to library path
|
|
|
|
|
echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf
|
|
|
|
|
ldconfig
|
|
|
|
|
|
|
|
|
|
# Add to PATH
|
|
|
|
|
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
|
|
|
|
|
source ~/.bashrc
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Custom llama.cpp Compilation
|
|
|
|
|
|
|
|
|
|
If you need custom CPU/GPU optimizations or specific hardware support, compile llama.cpp from source:
|
|
|
|
|
|
|
|
|
|
### Prerequisites
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
sudo apt update
|
|
|
|
|
sudo apt install build-essential cmake git
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Compilation Steps
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Clone llama.cpp repository
|
|
|
|
|
git clone https://github.com/ggerganov/llama.cpp
|
|
|
|
|
cd llama.cpp
|
|
|
|
|
|
|
|
|
|
# Create build directory
|
|
|
|
|
mkdir build
|
|
|
|
|
cd build
|
|
|
|
|
|
|
|
|
|
# Configure with CUDA support
|
|
|
|
|
cmake .. -DLLAMA_CUDA=ON -DLLAMA_CURL=OFF
|
|
|
|
|
|
|
|
|
|
# Compile using all available cores
|
|
|
|
|
make -j$(nproc)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Compilation Options
|
|
|
|
|
|
|
|
|
|
For different hardware configurations:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# CPU-only build (no GPU)
|
|
|
|
|
cmake .. -DLLAMA_CURL=OFF
|
|
|
|
|
|
|
|
|
|
# CUDA with specific compute capability
|
|
|
|
|
cmake .. -DLLAMA_CUDA=ON -DLLAMA_CUDA_FORCE_COMPUTE=75
|
|
|
|
|
|
|
|
|
|
# ROCm for AMD GPUs
|
|
|
|
|
cmake .. -DLLAMA_HIPBLAS=ON
|
|
|
|
|
|
|
|
|
|
# Metal for Apple Silicon
|
|
|
|
|
cmake .. -DLLAMA_METAL=ON
|
|
|
|
|
|
|
|
|
|
# AVX2 optimizations for modern CPUs
|
|
|
|
|
cmake .. -DLLAMA_AVX2=ON
|
|
|
|
|
|
|
|
|
|
# F16C for half-precision support
|
|
|
|
|
cmake .. -DLLAMA_F16C=ON
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### After Compilation
|
|
|
|
|
|
|
|
|
|
```bash
|
2025-12-12 23:18:36 -03:00
|
|
|
# Copy compiled binary to botserver
|
2025-12-03 19:56:35 -03:00
|
|
|
cp bin/llama-server /path/to/botserver-stack/bin/llm/
|
|
|
|
|
|
|
|
|
|
# Update config.csv to use custom build
|
|
|
|
|
llm-server-path,/path/to/botserver-stack/bin/llm/
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Benefits of Custom Compilation
|
|
|
|
|
|
|
|
|
|
- **Hardware-specific optimizations** for your exact CPU/GPU
|
|
|
|
|
- **Custom CUDA compute capabilities** for newer GPUs
|
|
|
|
|
- **AVX/AVX2/AVX512** instructions for faster CPU inference
|
|
|
|
|
- **Reduced binary size** by excluding unused features
|
|
|
|
|
- **Support for experimental features** not in releases
|
|
|
|
|
|
|
|
|
|
## Performance Optimization
|
|
|
|
|
|
|
|
|
|
### Memory Settings
|
|
|
|
|
|
|
|
|
|
For optimal LLM performance with GPU:
|
|
|
|
|
|
|
|
|
|
```csv
|
|
|
|
|
name,value
|
|
|
|
|
llm-server-gpu-layers,35
|
|
|
|
|
llm-server-mlock,true
|
|
|
|
|
llm-server-no-mmap,false
|
|
|
|
|
llm-server-ctx-size,4096
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Multiple GPUs
|
|
|
|
|
|
|
|
|
|
For systems with multiple GPUs, specify which GPU to use:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# List available GPUs
|
|
|
|
|
lxc profile device add gpu gpu0 gpu gputype=physical id=0
|
|
|
|
|
lxc profile device add gpu gpu1 gpu gputype=physical id=1
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Benefits of GPU Acceleration
|
|
|
|
|
|
|
|
|
|
With GPU acceleration enabled:
|
|
|
|
|
- **5-10x faster** inference compared to CPU
|
|
|
|
|
- **Higher context sizes** possible (8K-32K tokens)
|
|
|
|
|
- **Real-time responses** even with large models
|
|
|
|
|
- **Lower CPU usage** for other tasks
|
|
|
|
|
- **Support for larger models** (13B, 30B parameters)
|
|
|
|
|
|
|
|
|
|
## Next Steps
|
|
|
|
|
|
2025-12-12 23:18:36 -03:00
|
|
|
- [Installation Guide](./installation.md) - Complete botserver setup
|
2025-12-03 19:56:35 -03:00
|
|
|
- [Quick Start](./quick-start.md) - Create your first bot
|
|
|
|
|
- [Configuration Reference](../chapter-02/gbot.md) - All GPU-related parameters
|