Introduces IMAGE, VIDEO, AUDIO, and SEE keywords for BASIC scripts that connect to the botmodels service for AI-powered media generation and vision/captioning capabilities. - Add BotModelsClient for HTTP communication with botmodels service - Implement BASIC keywords: IMAGE, VIDEO, AUDIO (generation), SEE (captioning) - Support configuration via config.csv for models
191 lines
No EOL
6.1 KiB
Markdown
191 lines
No EOL
6.1 KiB
Markdown
# Multimodal Configuration Guide
|
|
|
|
This document describes how to configure botserver to use the botmodels service for image, video, audio generation, and vision/captioning capabilities.
|
|
|
|
## Overview
|
|
|
|
The multimodal feature connects botserver to botmodels - a Python-based service similar to llama.cpp but for multimodal AI tasks. This enables BASIC scripts to generate images, videos, audio, and analyze visual content.
|
|
|
|
## Configuration Keys
|
|
|
|
Add the following configuration to your bot's `config.csv` file:
|
|
|
|
### Image Generator Settings
|
|
|
|
| Key | Default | Description |
|
|
|-----|---------|-------------|
|
|
| `image-generator-model` | - | Path to the image generation model (e.g., `../../../../data/diffusion/sd_turbo_f16.gguf`) |
|
|
| `image-generator-steps` | `4` | Number of inference steps for image generation |
|
|
| `image-generator-width` | `512` | Output image width in pixels |
|
|
| `image-generator-height` | `512` | Output image height in pixels |
|
|
| `image-generator-gpu-layers` | `20` | Number of layers to offload to GPU |
|
|
| `image-generator-batch-size` | `1` | Batch size for generation |
|
|
|
|
### Video Generator Settings
|
|
|
|
| Key | Default | Description |
|
|
|-----|---------|-------------|
|
|
| `video-generator-model` | - | Path to the video generation model (e.g., `../../../../data/diffusion/zeroscope_v2_576w`) |
|
|
| `video-generator-frames` | `24` | Number of frames to generate |
|
|
| `video-generator-fps` | `8` | Frames per second for output video |
|
|
| `video-generator-width` | `320` | Output video width in pixels |
|
|
| `video-generator-height` | `576` | Output video height in pixels |
|
|
| `video-generator-gpu-layers` | `15` | Number of layers to offload to GPU |
|
|
| `video-generator-batch-size` | `1` | Batch size for generation |
|
|
|
|
### BotModels Service Settings
|
|
|
|
| Key | Default | Description |
|
|
|-----|---------|-------------|
|
|
| `botmodels-enabled` | `false` | Enable/disable botmodels integration |
|
|
| `botmodels-host` | `0.0.0.0` | Host address for botmodels service |
|
|
| `botmodels-port` | `8085` | Port for botmodels service |
|
|
| `botmodels-api-key` | - | API key for authentication with botmodels |
|
|
| `botmodels-https` | `false` | Use HTTPS for connection to botmodels |
|
|
|
|
## Example config.csv
|
|
|
|
```csv
|
|
key,value
|
|
image-generator-model,../../../../data/diffusion/sd_turbo_f16.gguf
|
|
image-generator-steps,4
|
|
image-generator-width,512
|
|
image-generator-height,512
|
|
image-generator-gpu-layers,20
|
|
image-generator-batch-size,1
|
|
video-generator-model,../../../../data/diffusion/zeroscope_v2_576w
|
|
video-generator-frames,24
|
|
video-generator-fps,8
|
|
video-generator-width,320
|
|
video-generator-height,576
|
|
video-generator-gpu-layers,15
|
|
video-generator-batch-size,1
|
|
botmodels-enabled,true
|
|
botmodels-host,0.0.0.0
|
|
botmodels-port,8085
|
|
botmodels-api-key,your-secret-key
|
|
botmodels-https,false
|
|
```
|
|
|
|
## BASIC Keywords
|
|
|
|
Once configured, the following keywords become available in BASIC scripts:
|
|
|
|
### IMAGE
|
|
|
|
Generate an image from a text prompt.
|
|
|
|
```basic
|
|
file = IMAGE "a cute cat playing with yarn"
|
|
SEND FILE TO user, file
|
|
```
|
|
|
|
### VIDEO
|
|
|
|
Generate a video from a text prompt.
|
|
|
|
```basic
|
|
file = VIDEO "a rocket launching into space"
|
|
SEND FILE TO user, file
|
|
```
|
|
|
|
### AUDIO
|
|
|
|
Generate speech audio from text.
|
|
|
|
```basic
|
|
file = AUDIO "Hello, welcome to our service!"
|
|
SEND FILE TO user, file
|
|
```
|
|
|
|
### SEE
|
|
|
|
Get a caption/description of an image or video file.
|
|
|
|
```basic
|
|
caption = SEE "/path/to/image.jpg"
|
|
TALK caption
|
|
|
|
// Also works with video files
|
|
description = SEE "/path/to/video.mp4"
|
|
TALK description
|
|
```
|
|
|
|
## Starting BotModels Service
|
|
|
|
Before using multimodal features, start the botmodels service:
|
|
|
|
```bash
|
|
cd botmodels
|
|
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085
|
|
```
|
|
|
|
Or with HTTPS:
|
|
|
|
```bash
|
|
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --ssl-keyfile key.pem --ssl-certfile cert.pem
|
|
```
|
|
|
|
## API Endpoints (BotModels)
|
|
|
|
The botmodels service exposes these REST endpoints:
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/api/image/generate` | POST | Generate image from prompt |
|
|
| `/api/video/generate` | POST | Generate video from prompt |
|
|
| `/api/speech/generate` | POST | Generate speech from text |
|
|
| `/api/speech/totext` | POST | Convert audio to text |
|
|
| `/api/vision/describe` | POST | Get description of an image |
|
|
| `/api/vision/describe_video` | POST | Get description of a video |
|
|
| `/api/vision/vqa` | POST | Visual question answering |
|
|
| `/api/health` | GET | Health check |
|
|
|
|
All endpoints require the `X-API-Key` header for authentication.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────┐ HTTPS ┌─────────────┐
|
|
│ botserver │ ────────────▶ │ botmodels │
|
|
│ (Rust) │ │ (Python) │
|
|
└─────────────┘ └─────────────┘
|
|
│ │
|
|
│ BASIC Keywords │ AI Models
|
|
│ - IMAGE │ - Stable Diffusion
|
|
│ - VIDEO │ - Zeroscope
|
|
│ - AUDIO │ - TTS/Whisper
|
|
│ - SEE │ - BLIP2
|
|
▼ ▼
|
|
┌─────────────┐ ┌─────────────┐
|
|
│ config │ │ outputs │
|
|
│ .csv │ │ (files) │
|
|
└─────────────┘ └─────────────┘
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### "BotModels is not enabled"
|
|
|
|
Set `botmodels-enabled=true` in your config.csv.
|
|
|
|
### Connection refused
|
|
|
|
1. Ensure botmodels service is running
|
|
2. Check host/port configuration
|
|
3. Verify firewall settings
|
|
|
|
### Authentication failed
|
|
|
|
Ensure `botmodels-api-key` in config.csv matches `API_KEY` environment variable in botmodels.
|
|
|
|
### Model not found
|
|
|
|
Verify model paths are correct and models are downloaded to the expected locations.
|
|
|
|
## Security Notes
|
|
|
|
1. Always use HTTPS in production (`botmodels-https=true`)
|
|
2. Use strong, unique API keys
|
|
3. Restrict network access to botmodels service
|
|
4. Consider running botmodels on a separate GPU server |