Add multimodal module for botmodels integration

Introduces IMAGE, VIDEO, AUDIO, and SEE keywords for BASIC scripts that
connect to the botmodels service for AI-powered media generation and
vision/captioning capabilities.

- Add BotModelsClient for HTTP communication with botmodels service
- Implement BASIC keywords: IMAGE, VIDEO, AUDIO (generation), SEE
  (captioning)
- Support configuration via config.csv for models
This commit is contained in:
Rodrigo Rodriguez (Pragmatismo) 2025-11-29 20:40:08 -03:00
parent d761a076aa
commit a21292daa3
7 changed files with 1200 additions and 0 deletions

191
docs/multimodal-config.md Normal file
View file

@ -0,0 +1,191 @@
# Multimodal Configuration Guide
This document describes how to configure botserver to use the botmodels service for image, video, audio generation, and vision/captioning capabilities.
## Overview
The multimodal feature connects botserver to botmodels - a Python-based service similar to llama.cpp but for multimodal AI tasks. This enables BASIC scripts to generate images, videos, audio, and analyze visual content.
## Configuration Keys
Add the following configuration to your bot's `config.csv` file:
### Image Generator Settings
| Key | Default | Description |
|-----|---------|-------------|
| `image-generator-model` | - | Path to the image generation model (e.g., `../../../../data/diffusion/sd_turbo_f16.gguf`) |
| `image-generator-steps` | `4` | Number of inference steps for image generation |
| `image-generator-width` | `512` | Output image width in pixels |
| `image-generator-height` | `512` | Output image height in pixels |
| `image-generator-gpu-layers` | `20` | Number of layers to offload to GPU |
| `image-generator-batch-size` | `1` | Batch size for generation |
### Video Generator Settings
| Key | Default | Description |
|-----|---------|-------------|
| `video-generator-model` | - | Path to the video generation model (e.g., `../../../../data/diffusion/zeroscope_v2_576w`) |
| `video-generator-frames` | `24` | Number of frames to generate |
| `video-generator-fps` | `8` | Frames per second for output video |
| `video-generator-width` | `320` | Output video width in pixels |
| `video-generator-height` | `576` | Output video height in pixels |
| `video-generator-gpu-layers` | `15` | Number of layers to offload to GPU |
| `video-generator-batch-size` | `1` | Batch size for generation |
### BotModels Service Settings
| Key | Default | Description |
|-----|---------|-------------|
| `botmodels-enabled` | `false` | Enable/disable botmodels integration |
| `botmodels-host` | `0.0.0.0` | Host address for botmodels service |
| `botmodels-port` | `8085` | Port for botmodels service |
| `botmodels-api-key` | - | API key for authentication with botmodels |
| `botmodels-https` | `false` | Use HTTPS for connection to botmodels |
## Example config.csv
```csv
key,value
image-generator-model,../../../../data/diffusion/sd_turbo_f16.gguf
image-generator-steps,4
image-generator-width,512
image-generator-height,512
image-generator-gpu-layers,20
image-generator-batch-size,1
video-generator-model,../../../../data/diffusion/zeroscope_v2_576w
video-generator-frames,24
video-generator-fps,8
video-generator-width,320
video-generator-height,576
video-generator-gpu-layers,15
video-generator-batch-size,1
botmodels-enabled,true
botmodels-host,0.0.0.0
botmodels-port,8085
botmodels-api-key,your-secret-key
botmodels-https,false
```
## BASIC Keywords
Once configured, the following keywords become available in BASIC scripts:
### IMAGE
Generate an image from a text prompt.
```basic
file = IMAGE "a cute cat playing with yarn"
SEND FILE TO user, file
```
### VIDEO
Generate a video from a text prompt.
```basic
file = VIDEO "a rocket launching into space"
SEND FILE TO user, file
```
### AUDIO
Generate speech audio from text.
```basic
file = AUDIO "Hello, welcome to our service!"
SEND FILE TO user, file
```
### SEE
Get a caption/description of an image or video file.
```basic
caption = SEE "/path/to/image.jpg"
TALK caption
// Also works with video files
description = SEE "/path/to/video.mp4"
TALK description
```
## Starting BotModels Service
Before using multimodal features, start the botmodels service:
```bash
cd botmodels
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085
```
Or with HTTPS:
```bash
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --ssl-keyfile key.pem --ssl-certfile cert.pem
```
## API Endpoints (BotModels)
The botmodels service exposes these REST endpoints:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/image/generate` | POST | Generate image from prompt |
| `/api/video/generate` | POST | Generate video from prompt |
| `/api/speech/generate` | POST | Generate speech from text |
| `/api/speech/totext` | POST | Convert audio to text |
| `/api/vision/describe` | POST | Get description of an image |
| `/api/vision/describe_video` | POST | Get description of a video |
| `/api/vision/vqa` | POST | Visual question answering |
| `/api/health` | GET | Health check |
All endpoints require the `X-API-Key` header for authentication.
## Architecture
```
┌─────────────┐ HTTPS ┌─────────────┐
│ botserver │ ────────────▶ │ botmodels │
│ (Rust) │ │ (Python) │
└─────────────┘ └─────────────┘
│ │
│ BASIC Keywords │ AI Models
│ - IMAGE │ - Stable Diffusion
│ - VIDEO │ - Zeroscope
│ - AUDIO │ - TTS/Whisper
│ - SEE │ - BLIP2
▼ ▼
┌─────────────┐ ┌─────────────┐
│ config │ │ outputs │
│ .csv │ │ (files) │
└─────────────┘ └─────────────┘
```
## Troubleshooting
### "BotModels is not enabled"
Set `botmodels-enabled=true` in your config.csv.
### Connection refused
1. Ensure botmodels service is running
2. Check host/port configuration
3. Verify firewall settings
### Authentication failed
Ensure `botmodels-api-key` in config.csv matches `API_KEY` environment variable in botmodels.
### Model not found
Verify model paths are correct and models are downloaded to the expected locations.
## Security Notes
1. Always use HTTPS in production (`botmodels-https=true`)
2. Use strong, unique API keys
3. Restrict network access to botmodels service
4. Consider running botmodels on a separate GPU server

View file

@ -15,6 +15,7 @@ pub mod get;
pub mod hear_talk;
pub mod last;
pub mod llm_keyword;
pub mod multimodal;
pub mod on;
pub mod print;
pub mod remember;

View file

@ -0,0 +1,323 @@
//! Multimodal keywords for image, video, audio generation and vision/captioning
//!
//! Provides BASIC keywords:
//! - IMAGE "prompt" -> generates image, returns file URL
//! - VIDEO "prompt" -> generates video, returns file URL
//! - AUDIO "text" -> generates speech audio, returns file URL
//! - SEE file -> gets caption/description of image or video
use crate::multimodal::BotModelsClient;
use crate::shared::models::UserSession;
use crate::shared::state::AppState;
use log::{error, trace};
use rhai::{Dynamic, Engine};
use std::sync::Arc;
use std::time::Duration;
/// Register all multimodal keywords
pub fn register_multimodal_keywords(state: Arc<AppState>, user: UserSession, engine: &mut Engine) {
image_keyword(state.clone(), user.clone(), engine);
video_keyword(state.clone(), user.clone(), engine);
audio_keyword(state.clone(), user.clone(), engine);
see_keyword(state.clone(), user.clone(), engine);
}
/// IMAGE "prompt" - Generate an image from text prompt
/// Returns the URL/path to the generated image file
pub fn image_keyword(state: Arc<AppState>, user: UserSession, engine: &mut Engine) {
let state_clone = Arc::clone(&state);
let user_clone = user.clone();
engine
.register_custom_syntax(&["IMAGE", "$expr$"], false, move |context, inputs| {
let prompt = context.eval_expression_tree(&inputs[0])?.to_string();
trace!("IMAGE keyword: generating image for prompt: {}", prompt);
let state_for_thread = Arc::clone(&state_clone);
let bot_id = user_clone.bot_id;
let (tx, rx) = std::sync::mpsc::channel();
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_multi_thread()
.worker_threads(2)
.enable_all()
.build();
let send_err = if let Ok(rt) = rt {
let result = rt.block_on(async move {
execute_image_generation(state_for_thread, bot_id, prompt).await
});
tx.send(result).err()
} else {
tx.send(Err("Failed to build tokio runtime".into())).err()
};
if send_err.is_some() {
error!("Failed to send IMAGE result");
}
});
match rx.recv_timeout(Duration::from_secs(300)) {
Ok(Ok(result)) => Ok(Dynamic::from(result)),
Ok(Err(e)) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
e.to_string().into(),
rhai::Position::NONE,
))),
Err(std::sync::mpsc::RecvTimeoutError::Timeout) => {
Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
"Image generation timed out".into(),
rhai::Position::NONE,
)))
}
Err(e) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
format!("IMAGE thread failed: {}", e).into(),
rhai::Position::NONE,
))),
}
})
.unwrap();
}
async fn execute_image_generation(
state: Arc<AppState>,
bot_id: uuid::Uuid,
prompt: String,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
let client = BotModelsClient::from_state(&state, &bot_id);
if !client.is_enabled() {
return Err("BotModels is not enabled. Set botmodels-enabled=true in config.csv".into());
}
client.generate_image(&prompt).await
}
/// VIDEO "prompt" - Generate a video from text prompt
/// Returns the URL/path to the generated video file
pub fn video_keyword(state: Arc<AppState>, user: UserSession, engine: &mut Engine) {
let state_clone = Arc::clone(&state);
let user_clone = user.clone();
engine
.register_custom_syntax(&["VIDEO", "$expr$"], false, move |context, inputs| {
let prompt = context.eval_expression_tree(&inputs[0])?.to_string();
trace!("VIDEO keyword: generating video for prompt: {}", prompt);
let state_for_thread = Arc::clone(&state_clone);
let bot_id = user_clone.bot_id;
let (tx, rx) = std::sync::mpsc::channel();
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_multi_thread()
.worker_threads(2)
.enable_all()
.build();
let send_err = if let Ok(rt) = rt {
let result = rt.block_on(async move {
execute_video_generation(state_for_thread, bot_id, prompt).await
});
tx.send(result).err()
} else {
tx.send(Err("Failed to build tokio runtime".into())).err()
};
if send_err.is_some() {
error!("Failed to send VIDEO result");
}
});
// Video generation can take longer
match rx.recv_timeout(Duration::from_secs(600)) {
Ok(Ok(result)) => Ok(Dynamic::from(result)),
Ok(Err(e)) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
e.to_string().into(),
rhai::Position::NONE,
))),
Err(std::sync::mpsc::RecvTimeoutError::Timeout) => {
Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
"Video generation timed out".into(),
rhai::Position::NONE,
)))
}
Err(e) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
format!("VIDEO thread failed: {}", e).into(),
rhai::Position::NONE,
))),
}
})
.unwrap();
}
async fn execute_video_generation(
state: Arc<AppState>,
bot_id: uuid::Uuid,
prompt: String,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
let client = BotModelsClient::from_state(&state, &bot_id);
if !client.is_enabled() {
return Err("BotModels is not enabled. Set botmodels-enabled=true in config.csv".into());
}
client.generate_video(&prompt).await
}
/// AUDIO "text" - Generate speech audio from text
/// Returns the URL/path to the generated audio file
pub fn audio_keyword(state: Arc<AppState>, user: UserSession, engine: &mut Engine) {
let state_clone = Arc::clone(&state);
let user_clone = user.clone();
engine
.register_custom_syntax(&["AUDIO", "$expr$"], false, move |context, inputs| {
let text = context.eval_expression_tree(&inputs[0])?.to_string();
trace!("AUDIO keyword: generating speech for text: {}", text);
let state_for_thread = Arc::clone(&state_clone);
let bot_id = user_clone.bot_id;
let (tx, rx) = std::sync::mpsc::channel();
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_multi_thread()
.worker_threads(2)
.enable_all()
.build();
let send_err = if let Ok(rt) = rt {
let result = rt.block_on(async move {
execute_audio_generation(state_for_thread, bot_id, text).await
});
tx.send(result).err()
} else {
tx.send(Err("Failed to build tokio runtime".into())).err()
};
if send_err.is_some() {
error!("Failed to send AUDIO result");
}
});
match rx.recv_timeout(Duration::from_secs(120)) {
Ok(Ok(result)) => Ok(Dynamic::from(result)),
Ok(Err(e)) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
e.to_string().into(),
rhai::Position::NONE,
))),
Err(std::sync::mpsc::RecvTimeoutError::Timeout) => {
Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
"Audio generation timed out".into(),
rhai::Position::NONE,
)))
}
Err(e) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
format!("AUDIO thread failed: {}", e).into(),
rhai::Position::NONE,
))),
}
})
.unwrap();
}
async fn execute_audio_generation(
state: Arc<AppState>,
bot_id: uuid::Uuid,
text: String,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
let client = BotModelsClient::from_state(&state, &bot_id);
if !client.is_enabled() {
return Err("BotModels is not enabled. Set botmodels-enabled=true in config.csv".into());
}
client.generate_audio(&text, None, None).await
}
/// SEE file - Get caption/description of an image or video file
/// Returns the text description of the visual content
pub fn see_keyword(state: Arc<AppState>, user: UserSession, engine: &mut Engine) {
let state_clone = Arc::clone(&state);
let user_clone = user.clone();
engine
.register_custom_syntax(&["SEE", "$expr$"], false, move |context, inputs| {
let file_path = context.eval_expression_tree(&inputs[0])?.to_string();
trace!("SEE keyword: getting caption for file: {}", file_path);
let state_for_thread = Arc::clone(&state_clone);
let bot_id = user_clone.bot_id;
let (tx, rx) = std::sync::mpsc::channel();
std::thread::spawn(move || {
let rt = tokio::runtime::Builder::new_multi_thread()
.worker_threads(2)
.enable_all()
.build();
let send_err = if let Ok(rt) = rt {
let result = rt.block_on(async move {
execute_see_caption(state_for_thread, bot_id, file_path).await
});
tx.send(result).err()
} else {
tx.send(Err("Failed to build tokio runtime".into())).err()
};
if send_err.is_some() {
error!("Failed to send SEE result");
}
});
match rx.recv_timeout(Duration::from_secs(60)) {
Ok(Ok(result)) => Ok(Dynamic::from(result)),
Ok(Err(e)) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
e.to_string().into(),
rhai::Position::NONE,
))),
Err(std::sync::mpsc::RecvTimeoutError::Timeout) => {
Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
"Vision/caption timed out".into(),
rhai::Position::NONE,
)))
}
Err(e) => Err(Box::new(rhai::EvalAltResult::ErrorRuntime(
format!("SEE thread failed: {}", e).into(),
rhai::Position::NONE,
))),
}
})
.unwrap();
}
async fn execute_see_caption(
state: Arc<AppState>,
bot_id: uuid::Uuid,
file_path: String,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
let client = BotModelsClient::from_state(&state, &bot_id);
if !client.is_enabled() {
return Err("BotModels is not enabled. Set botmodels-enabled=true in config.csv".into());
}
// Determine if it's a video or image based on extension
let lower_path = file_path.to_lowercase();
if lower_path.ends_with(".mp4")
|| lower_path.ends_with(".avi")
|| lower_path.ends_with(".mov")
|| lower_path.ends_with(".webm")
|| lower_path.ends_with(".mkv")
{
client.describe_video(&file_path).await
} else {
client.describe_image(&file_path).await
}
}

View file

@ -23,6 +23,7 @@ use self::keywords::format::format_keyword;
use self::keywords::get::get_keyword;
use self::keywords::hear_talk::{hear_keyword, talk_keyword};
use self::keywords::last::last_keyword;
use self::keywords::multimodal::register_multimodal_keywords;
use self::keywords::remember::remember_keyword;
use self::keywords::save_from_unstructured::save_from_unstructured_keyword;
use self::keywords::send_mail::send_mail_keyword;
@ -92,6 +93,10 @@ impl ScriptService {
&mut engine,
);
// Register multimodal keywords (IMAGE, VIDEO, AUDIO, SEE)
// These connect to botmodels for image/video/audio generation and vision/captioning
register_multimodal_keywords(state.clone(), user.clone(), &mut engine);
ScriptService { engine }
}
fn preprocess_basic_script(&self, script: &str) -> String {
@ -158,6 +163,11 @@ impl ScriptService {
"SET USER",
"GET BOT MEMORY",
"SET BOT MEMORY",
"IMAGE",
"VIDEO",
"AUDIO",
"SEE",
"SEND FILE",
];
let is_basic_command = basic_commands.iter().any(|&cmd| trimmed.starts_with(cmd));
let is_control_flow = trimmed.starts_with("IF")

View file

@ -1,6 +1,7 @@
// Core modules (always included)
pub mod basic;
pub mod core;
pub mod multimodal;
pub mod security;
pub mod web;

652
src/multimodal/mod.rs Normal file
View file

@ -0,0 +1,652 @@
//! Multimodal module for botmodels integration
//! Provides client for image, video, audio generation and vision/captioning
use crate::config::ConfigManager;
use crate::shared::state::AppState;
use log::{error, info, trace};
use reqwest::Client;
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use uuid::Uuid;
/// Configuration for botmodels connection
#[derive(Debug, Clone)]
pub struct BotModelsConfig {
pub enabled: bool,
pub host: String,
pub port: u16,
pub api_key: String,
pub use_https: bool,
}
impl BotModelsConfig {
pub fn from_database(config_manager: &ConfigManager, bot_id: &Uuid) -> Self {
let enabled = config_manager
.get_config(bot_id, "botmodels-enabled", Some("false"))
.unwrap_or_default()
.to_lowercase()
== "true";
let host = config_manager
.get_config(bot_id, "botmodels-host", Some("0.0.0.0"))
.unwrap_or_else(|_| "0.0.0.0".to_string());
let port = config_manager
.get_config(bot_id, "botmodels-port", Some("8085"))
.unwrap_or_else(|_| "8085".to_string())
.parse()
.unwrap_or(8085);
let api_key = config_manager
.get_config(bot_id, "botmodels-api-key", Some(""))
.unwrap_or_default();
let use_https = config_manager
.get_config(bot_id, "botmodels-https", Some("false"))
.unwrap_or_default()
.to_lowercase()
== "true";
Self {
enabled,
host,
port,
api_key,
use_https,
}
}
pub fn base_url(&self) -> String {
let protocol = if self.use_https { "https" } else { "http" };
format!("{}://{}:{}", protocol, self.host, self.port)
}
}
/// Image generation configuration
#[derive(Debug, Clone)]
pub struct ImageGeneratorConfig {
pub model: String,
pub steps: u32,
pub width: u32,
pub height: u32,
pub gpu_layers: u32,
pub batch_size: u32,
}
impl ImageGeneratorConfig {
pub fn from_database(config_manager: &ConfigManager, bot_id: &Uuid) -> Self {
Self {
model: config_manager
.get_config(bot_id, "image-generator-model", None)
.unwrap_or_default(),
steps: config_manager
.get_config(bot_id, "image-generator-steps", Some("4"))
.unwrap_or_else(|_| "4".to_string())
.parse()
.unwrap_or(4),
width: config_manager
.get_config(bot_id, "image-generator-width", Some("512"))
.unwrap_or_else(|_| "512".to_string())
.parse()
.unwrap_or(512),
height: config_manager
.get_config(bot_id, "image-generator-height", Some("512"))
.unwrap_or_else(|_| "512".to_string())
.parse()
.unwrap_or(512),
gpu_layers: config_manager
.get_config(bot_id, "image-generator-gpu-layers", Some("20"))
.unwrap_or_else(|_| "20".to_string())
.parse()
.unwrap_or(20),
batch_size: config_manager
.get_config(bot_id, "image-generator-batch-size", Some("1"))
.unwrap_or_else(|_| "1".to_string())
.parse()
.unwrap_or(1),
}
}
}
/// Video generation configuration
#[derive(Debug, Clone)]
pub struct VideoGeneratorConfig {
pub model: String,
pub frames: u32,
pub fps: u32,
pub width: u32,
pub height: u32,
pub gpu_layers: u32,
pub batch_size: u32,
}
impl VideoGeneratorConfig {
pub fn from_database(config_manager: &ConfigManager, bot_id: &Uuid) -> Self {
Self {
model: config_manager
.get_config(bot_id, "video-generator-model", None)
.unwrap_or_default(),
frames: config_manager
.get_config(bot_id, "video-generator-frames", Some("24"))
.unwrap_or_else(|_| "24".to_string())
.parse()
.unwrap_or(24),
fps: config_manager
.get_config(bot_id, "video-generator-fps", Some("8"))
.unwrap_or_else(|_| "8".to_string())
.parse()
.unwrap_or(8),
width: config_manager
.get_config(bot_id, "video-generator-width", Some("320"))
.unwrap_or_else(|_| "320".to_string())
.parse()
.unwrap_or(320),
height: config_manager
.get_config(bot_id, "video-generator-height", Some("576"))
.unwrap_or_else(|_| "576".to_string())
.parse()
.unwrap_or(576),
gpu_layers: config_manager
.get_config(bot_id, "video-generator-gpu-layers", Some("15"))
.unwrap_or_else(|_| "15".to_string())
.parse()
.unwrap_or(15),
batch_size: config_manager
.get_config(bot_id, "video-generator-batch-size", Some("1"))
.unwrap_or_else(|_| "1".to_string())
.parse()
.unwrap_or(1),
}
}
}
// API Request/Response types
#[derive(Debug, Serialize)]
pub struct ImageGenerateRequest {
pub prompt: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub steps: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub width: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub height: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub guidance_scale: Option<f32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub seed: Option<i64>,
}
#[derive(Debug, Serialize)]
pub struct VideoGenerateRequest {
pub prompt: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub num_frames: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub fps: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub steps: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
pub seed: Option<i64>,
}
#[derive(Debug, Serialize)]
pub struct SpeechGenerateRequest {
pub prompt: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub voice: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub language: Option<String>,
}
#[derive(Debug, Deserialize)]
pub struct GenerationResponse {
pub status: String,
pub file_path: Option<String>,
pub generation_time: Option<f64>,
pub error: Option<String>,
}
#[derive(Debug, Deserialize)]
pub struct DescribeResponse {
pub description: String,
pub confidence: Option<f64>,
}
#[derive(Debug, Deserialize)]
pub struct VideoDescribeResponse {
pub description: String,
pub frame_count: Option<u32>,
}
#[derive(Debug, Deserialize)]
pub struct SpeechToTextResponse {
pub text: String,
pub language: Option<String>,
pub confidence: Option<f64>,
}
/// BotModels client for multimodal operations
pub struct BotModelsClient {
client: Client,
config: BotModelsConfig,
image_config: ImageGeneratorConfig,
video_config: VideoGeneratorConfig,
}
impl BotModelsClient {
pub fn new(
config: BotModelsConfig,
image_config: ImageGeneratorConfig,
video_config: VideoGeneratorConfig,
) -> Self {
let client = Client::builder()
.danger_accept_invalid_certs(true) // For self-signed certs in dev
.timeout(std::time::Duration::from_secs(300)) // 5 min timeout for generation
.build()
.unwrap_or_else(|_| Client::new());
Self {
client,
config,
image_config,
video_config,
}
}
pub fn from_state(state: &AppState, bot_id: &Uuid) -> Self {
let config_manager = ConfigManager::new(state.conn.clone());
let config = BotModelsConfig::from_database(&config_manager, bot_id);
let image_config = ImageGeneratorConfig::from_database(&config_manager, bot_id);
let video_config = VideoGeneratorConfig::from_database(&config_manager, bot_id);
Self::new(config, image_config, video_config)
}
pub fn is_enabled(&self) -> bool {
self.config.enabled
}
/// Generate an image from a text prompt
pub async fn generate_image(
&self,
prompt: &str,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
if !self.config.enabled {
return Err("BotModels is not enabled".into());
}
let url = format!("{}/api/image/generate", self.config.base_url());
trace!("Generating image at {}: {}", url, prompt);
let request = ImageGenerateRequest {
prompt: prompt.to_string(),
steps: Some(self.image_config.steps),
width: Some(self.image_config.width),
height: Some(self.image_config.height),
guidance_scale: Some(7.5),
seed: None,
};
let response = self
.client
.post(&url)
.header("X-API-Key", &self.config.api_key)
.json(&request)
.send()
.await?;
if !response.status().is_success() {
let error_text = response.text().await.unwrap_or_default();
error!("Image generation failed: {}", error_text);
return Err(format!("Image generation failed: {}", error_text).into());
}
let result: GenerationResponse = response.json().await?;
if result.status == "completed" {
if let Some(file_path) = result.file_path {
let full_url = format!("{}{}", self.config.base_url(), file_path);
info!("Image generated: {}", full_url);
return Ok(full_url);
}
}
Err(result
.error
.unwrap_or_else(|| "Unknown error".to_string())
.into())
}
/// Generate a video from a text prompt
pub async fn generate_video(
&self,
prompt: &str,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
if !self.config.enabled {
return Err("BotModels is not enabled".into());
}
let url = format!("{}/api/video/generate", self.config.base_url());
trace!("Generating video at {}: {}", url, prompt);
let request = VideoGenerateRequest {
prompt: prompt.to_string(),
num_frames: Some(self.video_config.frames),
fps: Some(self.video_config.fps),
steps: Some(50),
seed: None,
};
let response = self
.client
.post(&url)
.header("X-API-Key", &self.config.api_key)
.json(&request)
.send()
.await?;
if !response.status().is_success() {
let error_text = response.text().await.unwrap_or_default();
error!("Video generation failed: {}", error_text);
return Err(format!("Video generation failed: {}", error_text).into());
}
let result: GenerationResponse = response.json().await?;
if result.status == "completed" {
if let Some(file_path) = result.file_path {
let full_url = format!("{}{}", self.config.base_url(), file_path);
info!("Video generated: {}", full_url);
return Ok(full_url);
}
}
Err(result
.error
.unwrap_or_else(|| "Unknown error".to_string())
.into())
}
/// Generate audio/speech from text
pub async fn generate_audio(
&self,
text: &str,
voice: Option<&str>,
language: Option<&str>,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
if !self.config.enabled {
return Err("BotModels is not enabled".into());
}
let url = format!("{}/api/speech/generate", self.config.base_url());
trace!("Generating audio at {}: {}", url, text);
let request = SpeechGenerateRequest {
prompt: text.to_string(),
voice: voice.map(String::from),
language: language.map(String::from),
};
let response = self
.client
.post(&url)
.header("X-API-Key", &self.config.api_key)
.json(&request)
.send()
.await?;
if !response.status().is_success() {
let error_text = response.text().await.unwrap_or_default();
error!("Audio generation failed: {}", error_text);
return Err(format!("Audio generation failed: {}", error_text).into());
}
let result: GenerationResponse = response.json().await?;
if result.status == "completed" {
if let Some(file_path) = result.file_path {
let full_url = format!("{}{}", self.config.base_url(), file_path);
info!("Audio generated: {}", full_url);
return Ok(full_url);
}
}
Err(result
.error
.unwrap_or_else(|| "Unknown error".to_string())
.into())
}
/// Get caption/description for an image
pub async fn describe_image(
&self,
image_url_or_path: &str,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
if !self.config.enabled {
return Err("BotModels is not enabled".into());
}
let url = format!("{}/api/vision/describe", self.config.base_url());
trace!("Describing image at {}: {}", url, image_url_or_path);
// If it's a URL, download the image first
let image_data = if image_url_or_path.starts_with("http") {
let response = self.client.get(image_url_or_path).send().await?;
response.bytes().await?.to_vec()
} else {
tokio::fs::read(image_url_or_path).await?
};
let form = reqwest::multipart::Form::new().part(
"file",
reqwest::multipart::Part::bytes(image_data)
.file_name("image.png")
.mime_str("image/png")?,
);
let response = self
.client
.post(&url)
.header("X-API-Key", &self.config.api_key)
.multipart(form)
.send()
.await?;
if !response.status().is_success() {
let error_text = response.text().await.unwrap_or_default();
error!("Image description failed: {}", error_text);
return Err(format!("Image description failed: {}", error_text).into());
}
let result: DescribeResponse = response.json().await?;
info!("Image described: {}", result.description);
Ok(result.description)
}
/// Get caption/description for a video
pub async fn describe_video(
&self,
video_url_or_path: &str,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
if !self.config.enabled {
return Err("BotModels is not enabled".into());
}
let url = format!("{}/api/vision/describe_video", self.config.base_url());
trace!("Describing video at {}: {}", url, video_url_or_path);
let video_data = if video_url_or_path.starts_with("http") {
let response = self.client.get(video_url_or_path).send().await?;
response.bytes().await?.to_vec()
} else {
tokio::fs::read(video_url_or_path).await?
};
let form = reqwest::multipart::Form::new().part(
"file",
reqwest::multipart::Part::bytes(video_data)
.file_name("video.mp4")
.mime_str("video/mp4")?,
);
let response = self
.client
.post(&url)
.header("X-API-Key", &self.config.api_key)
.multipart(form)
.send()
.await?;
if !response.status().is_success() {
let error_text = response.text().await.unwrap_or_default();
error!("Video description failed: {}", error_text);
return Err(format!("Video description failed: {}", error_text).into());
}
let result: VideoDescribeResponse = response.json().await?;
info!("Video described: {}", result.description);
Ok(result.description)
}
/// Convert speech to text
pub async fn speech_to_text(
&self,
audio_url_or_path: &str,
) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
if !self.config.enabled {
return Err("BotModels is not enabled".into());
}
let url = format!("{}/api/speech/totext", self.config.base_url());
trace!(
"Converting speech to text at {}: {}",
url,
audio_url_or_path
);
let audio_data = if audio_url_or_path.starts_with("http") {
let response = self.client.get(audio_url_or_path).send().await?;
response.bytes().await?.to_vec()
} else {
tokio::fs::read(audio_url_or_path).await?
};
let form = reqwest::multipart::Form::new().part(
"file",
reqwest::multipart::Part::bytes(audio_data)
.file_name("audio.wav")
.mime_str("audio/wav")?,
);
let response = self
.client
.post(&url)
.header("X-API-Key", &self.config.api_key)
.multipart(form)
.send()
.await?;
if !response.status().is_success() {
let error_text = response.text().await.unwrap_or_default();
error!("Speech to text failed: {}", error_text);
return Err(format!("Speech to text failed: {}", error_text).into());
}
let result: SpeechToTextResponse = response.json().await?;
info!("Speech converted: {}", result.text);
Ok(result.text)
}
/// Check if botmodels server is healthy
pub async fn health_check(&self) -> bool {
if !self.config.enabled {
return false;
}
let url = format!("{}/api/health", self.config.base_url());
match self.client.get(&url).send().await {
Ok(response) => response.status().is_success(),
Err(_) => false,
}
}
/// Download generated file to local path
pub async fn download_file(
&self,
url: &str,
local_path: &str,
) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let response = self.client.get(url).send().await?;
let bytes = response.bytes().await?;
tokio::fs::write(local_path, bytes).await?;
Ok(())
}
}
/// Ensure botmodels server is running (similar to llama.cpp startup)
pub async fn ensure_botmodels_running(
app_state: Arc<AppState>,
) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
use crate::shared::models::schema::bots::dsl::*;
use diesel::prelude::*;
let config_values = {
let conn_arc = app_state.conn.clone();
let default_bot_id = tokio::task::spawn_blocking(move || {
let mut conn = conn_arc.get().unwrap();
bots.filter(name.eq("default"))
.select(id)
.first::<uuid::Uuid>(&mut *conn)
.unwrap_or_else(|_| uuid::Uuid::nil())
})
.await?;
let config_manager = ConfigManager::new(app_state.conn.clone());
let config = BotModelsConfig::from_database(&config_manager, &default_bot_id);
config
};
if !config_values.enabled {
info!("BotModels is disabled, skipping startup");
return Ok(());
}
info!("Checking BotModels server status...");
info!(" Host: {}", config_values.host);
info!(" Port: {}", config_values.port);
let client = BotModelsClient::new(
config_values.clone(),
ImageGeneratorConfig {
model: String::new(),
steps: 4,
width: 512,
height: 512,
gpu_layers: 20,
batch_size: 1,
},
VideoGeneratorConfig {
model: String::new(),
frames: 24,
fps: 8,
width: 320,
height: 576,
gpu_layers: 15,
batch_size: 1,
},
);
// Check if already running
if client.health_check().await {
info!("BotModels server is already running and healthy");
return Ok(());
}
info!("BotModels server not responding, it should be started externally");
info!(
"Start botmodels with: cd botmodels && python -m uvicorn src.main:app --host {} --port {}",
config_values.host, config_values.port
);
Ok(())
}

View file

@ -49,3 +49,25 @@ custom-password,
website-expires,1d
website-max-depth,3
website-max-pages,100
image-generator-model,../../../../data/diffusion/sd_turbo_f16.gguf
image-generator-steps,4
image-generator-width,512
image-generator-height,512
image-generator-gpu-layers,20
image-generator-batch-size,1
video-generator-model,../../../../data/diffusion/zeroscope_v2_576w
video-generator-frames,24
video-generator-fps,8
video-generator-width,320
video-generator-height,576
video-generator-gpu-layers,15
video-generator-batch-size,1
botmodels-enabled,true
botmodels-host,0.0.0.0
botmodels-port,8085
default-generator,all

1 name value
49 website-expires 1d
50 website-max-depth 3
51 website-max-pages 100
52 image-generator-model ../../../../data/diffusion/sd_turbo_f16.gguf
53 image-generator-steps 4
54 image-generator-width 512
55 image-generator-height 512
56 image-generator-gpu-layers 20
57 image-generator-batch-size 1
58 video-generator-model ../../../../data/diffusion/zeroscope_v2_576w
59 video-generator-frames 24
60 video-generator-fps 8
61 video-generator-width 320
62 video-generator-height 576
63 video-generator-gpu-layers 15
64 video-generator-batch-size 1
65 botmodels-enabled true
66 botmodels-host 0.0.0.0
67 botmodels-port 8085
68 default-generator all
69
70
71
72
73