2026-02-04 13:54:25 -03:00
# BotModels - AI Inference Service
2021-07-14 16:25:54 -03:00
2026-02-04 13:54:25 -03:00
**Version:** 1.0.0
**Purpose:** Multimodal AI inference service for General Bots
2021-07-14 16:25:54 -03:00
2026-02-04 13:54:25 -03:00
---
## Overview
BotModels is a Python-based AI inference service that provides multimodal capabilities to the General Bots platform. It serves as a companion to botserver (Rust), specializing in cutting-edge AI/ML models from the Python ecosystem including image generation, video creation, speech synthesis, and vision/captioning.
While botserver handles business logic, networking, and systems-level operations, BotModels exists solely to leverage the extensive Python AI/ML ecosystem for inference tasks that are impractical to implement in Rust.
For comprehensive documentation, see ** [docs.pragmatismo.com.br ](https://docs.pragmatismo.com.br )** or the ** [BotBook ](../botbook )** for detailed guides, API references, and tutorials.
---
2021-07-14 16:25:54 -03:00
2025-11-30 07:52:56 -03:00
## Features
2021-07-14 16:25:54 -03:00
2025-11-30 07:52:56 -03:00
- **Image Generation**: Generate images from text prompts using Stable Diffusion
- **Video Generation**: Create short videos from text descriptions using Zeroscope
- **Speech Synthesis**: Text-to-speech using Coqui TTS
- **Speech Recognition**: Audio transcription using OpenAI Whisper
- **Vision/Captioning**: Image and video description using BLIP2
2021-07-14 16:25:54 -03:00
2026-02-04 13:54:25 -03:00
---
2025-11-30 07:52:56 -03:00
## Quick Start
2021-08-26 09:23:30 -03:00
2025-11-30 07:52:56 -03:00
### Installation
```bash
# Clone the repository
cd botmodels
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
```
### Configuration
Copy the example environment file and configure:
```bash
cp .env.example .env
```
Edit `.env` with your settings:
```env
HOST=0.0.0.0
PORT=8085
API_KEY=your-secret-key
DEVICE=cuda
IMAGE_MODEL_PATH=./models/stable-diffusion-v1-5
VIDEO_MODEL_PATH=./models/zeroscope-v2
VISION_MODEL_PATH=./models/blip2
```
### Running the Server
```bash
# Development mode
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --reload
# Production mode
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --workers 4
# With HTTPS (production)
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --ssl-keyfile key.pem --ssl-certfile cert.pem
```
2026-02-04 13:54:25 -03:00
---
## 🐍 Philosophy & Scope
### Why Python?
- **Rust vs. Python Rule**:
- If logic is deterministic, systems-level, or performance-critical: **Do it in Rust (botserver)**
- If logic requires cutting-edge ML models, rapid experimentation with HuggingFace, or specific Python-only libraries: **Do it here**
### Architecture Principles
- **Inference Only**: This service should NOT hold business state. It accepts inputs, runs inference, and returns predictions.
- **Stateless**: Treated as a sidecar to `botserver` .
- **API First**: Exposes strict HTTP/REST endpoints consumed by `botserver` .
---
## 🛠 Technology Stack
- **Runtime**: Python 3.10+
- **Web Framework**: FastAPI (preferred over Flask for async/performance)
- **ML Frameworks**: PyTorch, HuggingFace Transformers, Diffusers
- **Quality**: `ruff` (linting), `black` (formatting), `mypy` (typing)
---
## 📡 API Endpoints
2025-11-30 07:52:56 -03:00
All endpoints require the `X-API-Key` header for authentication.
### Image Generation
```http
POST /api/image/generate
Content-Type: application/json
X-API-Key: your-api-key
{
"prompt": "a cute cat playing with yarn",
"steps": 30,
"width": 512,
"height": 512,
"guidance_scale": 7.5,
"seed": 42
}
```
### Video Generation
```http
POST /api/video/generate
Content-Type: application/json
X-API-Key: your-api-key
{
"prompt": "a rocket launching into space",
"num_frames": 24,
"fps": 8,
"steps": 50
}
```
### Speech Generation (TTS)
```http
POST /api/speech/generate
Content-Type: application/json
X-API-Key: your-api-key
{
"prompt": "Hello, welcome to our service!",
"voice": "default",
"language": "en"
}
```
### Speech to Text
```http
POST /api/speech/totext
Content-Type: multipart/form-data
X-API-Key: your-api-key
file: < audio_file >
```
### Image Description
```http
POST /api/vision/describe
Content-Type: multipart/form-data
X-API-Key: your-api-key
file: < image_file >
prompt: "What is in this image?" (optional)
```
### Video Description
```http
POST /api/vision/describe_video
Content-Type: multipart/form-data
X-API-Key: your-api-key
file: < video_file >
num_frames: 8 (optional)
```
### Visual Question Answering
```http
POST /api/vision/vqa
Content-Type: multipart/form-data
X-API-Key: your-api-key
file: < image_file >
question: "How many people are in this image?"
```
### Health Check
```http
GET /api/health
```
2026-02-04 13:54:25 -03:00
Interactive API documentation:
- Swagger UI: `http://localhost:8085/api/docs`
- ReDoc: `http://localhost:8085/api/redoc`
---
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
## 🔗 Integration with BotServer
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
### Configuration (config.csv)
2025-11-30 07:52:56 -03:00
```csv
key,value
botmodels-enabled,true
botmodels-host,0.0.0.0
botmodels-port,8085
botmodels-api-key,your-secret-key
botmodels-https,false
image-generator-model,../../../../data/diffusion/sd_turbo_f16.gguf
image-generator-steps,4
image-generator-width,512
image-generator-height,512
video-generator-model,../../../../data/diffusion/zeroscope_v2_576w
video-generator-frames,24
video-generator-fps,8
```
### BASIC Script Keywords
```basic
// Generate an image
file = IMAGE "a beautiful sunset over mountains"
SEND FILE TO user, file
// Generate a video
video = VIDEO "waves crashing on a beach"
SEND FILE TO user, video
// Generate speech
audio = AUDIO "Welcome to General Bots!"
SEND FILE TO user, audio
// Get image/video description
caption = SEE "/path/to/image.jpg"
TALK caption
```
2026-02-04 13:54:25 -03:00
---
## 🏗️ Architecture
2025-11-30 07:52:56 -03:00
```
┌─────────────┐ HTTPS ┌─────────────┐
│ botserver │ ────────────▶ │ botmodels │
│ (Rust) │ │ (Python) │
└─────────────┘ └─────────────┘
│ │
│ BASIC Keywords │ AI Models
│ - IMAGE │ - Stable Diffusion
│ - VIDEO │ - Zeroscope
│ - AUDIO │ - TTS/Whisper
│ - SEE │ - BLIP2
▼ ▼
┌─────────────┐ ┌─────────────┐
│ config │ │ outputs │
│ .csv │ │ (files) │
└─────────────┘ └─────────────┘
```
2026-02-04 13:54:25 -03:00
---
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
## ⚡️ Development Guidelines
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
### Modern Model Usage
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
- **Deprecate Legacy**: Move away from outdated libs (e.g., old `allennlp` ) in favor of **HuggingFace Transformers** and **Diffusers**
- **Quantization**: Always consider quantized models (bitsandbytes, GGUF) to reduce VRAM usage
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
### Performance & Loading
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
- **Lazy Loading**: Do NOT load 10GB models at module import time. Load on startup lifecycle or first request with locking
- **GPU Handling**: Robustly detect CUDA/MPS (Mac) and fallback to CPU gracefully
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
### Code Quality
2025-11-30 07:52:56 -03:00
2026-02-04 13:54:25 -03:00
- **Type Hints**: All functions MUST have type hints
- **Error Handling**: No bare `except:` . Catch precise exceptions and return structured JSON errors to `botserver`
2025-11-30 07:52:56 -03:00
### Project Structure
```
botmodels/
├── src/
│ ├── api/
│ │ ├── v1/
│ │ │ └── endpoints/
│ │ │ ├── image.py
│ │ │ ├── video.py
│ │ │ ├── speech.py
│ │ │ └── vision.py
│ │ └── dependencies.py
│ ├── core/
│ │ ├── config.py
│ │ └── logging.py
│ ├── schemas/
│ │ └── generation.py
│ ├── services/
│ │ ├── image_service.py
│ │ ├── video_service.py
│ │ ├── speech_service.py
│ │ └── vision_service.py
│ └── main.py
├── outputs/
├── models/
├── tests/
├── requirements.txt
└── README.md
```
2026-02-04 13:54:25 -03:00
---
## 🧪 Testing
2025-11-30 07:52:56 -03:00
```bash
pytest tests/
```
2026-02-04 13:54:25 -03:00
---
## 🔒 Security
2025-11-30 07:52:56 -03:00
1. **Always use HTTPS in production**
2. Use strong, unique API keys
3. Restrict network access to the service
4. Consider running on a separate GPU server
5. Monitor resource usage and set appropriate limits
2026-02-04 13:54:25 -03:00
---
## 📚 Documentation
For complete documentation, guides, and API references:
- **[docs.pragmatismo.com.br ](https://docs.pragmatismo.com.br )** - Full online documentation
- **[BotBook ](../botbook )** - Local comprehensive guide with tutorials and examples
- **[General Bots Repository ](https://github.com/GeneralBots/BotServer )** - Main project repository
---
## 📦 Requirements
2025-11-30 07:52:56 -03:00
- Python 3.10+
- CUDA-capable GPU (recommended, 8GB+ VRAM)
- 16GB+ RAM
2021-11-09 12:42:48 -03:00
2026-02-04 13:54:25 -03:00
---
## 🔗 Resources
2021-11-09 12:42:48 -03:00
2025-11-30 07:52:56 -03:00
### Education
2021-11-09 12:42:48 -03:00
2025-11-30 07:52:56 -03:00
- [Computer Vision Course ](https://pjreddie.com/courses/computer-vision/ )
- [Adversarial VQA Paper ](https://arxiv.org/abs/2106.00245 )
- [LLM Visualization ](https://bbycroft.net/llm )
2021-11-11 07:49:39 -03:00
2025-11-30 07:52:56 -03:00
### References
2021-11-11 07:49:39 -03:00
2025-11-30 07:52:56 -03:00
- [VizWiz VQA PyTorch ](https://github.com/DenisDsh/VizWiz-VQA-PyTorch )
- [Diffusers Library ](https://github.com/huggingface/diffusers )
- [OpenAI Whisper ](https://github.com/openai/whisper )
- [BLIP2 ](https://huggingface.co/Salesforce/blip2-opt-2.7b )
2021-11-11 07:57:56 -03:00
2025-11-30 07:52:56 -03:00
### Community
2021-11-11 07:57:56 -03:00
2025-11-30 07:52:56 -03:00
- [AI for Mankind ](https://github.com/aiformankind )
- [ManaAI ](https://manaai.cn/ )
2022-07-18 13:08:05 -03:00
2026-02-04 13:54:25 -03:00
---
## 🔑 Remember
- **Inference Only**: No business state, just predictions
- **Modern Models**: Use HuggingFace Transformers, Diffusers
- **Type Safety**: All functions must have type hints
- **Lazy Loading**: Don't load models at import time
- **GPU Detection**: Graceful fallback to CPU
- **Version 1.0.0** - Do not change without approval
- **GIT WORKFLOW** - ALWAYS push to ALL repositories (github, pragmatismo)
---
## 📄 License
2022-07-18 13:08:05 -03:00
2025-11-30 07:52:56 -03:00
See LICENSE file for details.