No description
Find a file
2026-02-20 15:27:38 -03:00
.forgejo/workflows Disable auto-trigger CI - manual only via workflow_dispatch 2025-12-16 17:07:42 -03:00
.vscode new(all): Removal of Azure Functions and migration to Flesk. 2021-07-13 16:15:36 -03:00
src chore: remove BotModels.png, update main.py 2026-01-10 17:32:36 -03:00
templates new(all): Removal of Azure Functions and migration to Flesk. 2021-07-13 16:15:36 -03:00
.env.example Rewrite BotModels as FastAPI multimodal AI service 2025-11-30 07:52:56 -03:00
.gitignore new(all): Removal of Azure Functions and migration to Flesk. 2021-07-13 16:15:36 -03:00
app.py new(all): General Bots reading comprehension. 2021-07-14 15:35:43 -03:00
README.md Update: delete PROMPT.md and update README.md 2026-02-04 13:54:25 -03:00
requirements.txt Add lead scoring API and vision enhancements 2025-12-01 08:34:59 -03:00

BotModels - AI Inference Service

Version: 1.0.0
Purpose: Multimodal AI inference service for General Bots


Overview

BotModels is a Python-based AI inference service that provides multimodal capabilities to the General Bots platform. It serves as a companion to botserver (Rust), specializing in cutting-edge AI/ML models from the Python ecosystem including image generation, video creation, speech synthesis, and vision/captioning.

While botserver handles business logic, networking, and systems-level operations, BotModels exists solely to leverage the extensive Python AI/ML ecosystem for inference tasks that are impractical to implement in Rust.

For comprehensive documentation, see docs.pragmatismo.com.br or the BotBook for detailed guides, API references, and tutorials.


Features

  • Image Generation: Generate images from text prompts using Stable Diffusion
  • Video Generation: Create short videos from text descriptions using Zeroscope
  • Speech Synthesis: Text-to-speech using Coqui TTS
  • Speech Recognition: Audio transcription using OpenAI Whisper
  • Vision/Captioning: Image and video description using BLIP2

Quick Start

Installation

# Clone the repository
cd botmodels

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
.\venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Configuration

Copy the example environment file and configure:

cp .env.example .env

Edit .env with your settings:

HOST=0.0.0.0
PORT=8085
API_KEY=your-secret-key
DEVICE=cuda
IMAGE_MODEL_PATH=./models/stable-diffusion-v1-5
VIDEO_MODEL_PATH=./models/zeroscope-v2
VISION_MODEL_PATH=./models/blip2

Running the Server

# Development mode
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --reload

# Production mode
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --workers 4

# With HTTPS (production)
python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --ssl-keyfile key.pem --ssl-certfile cert.pem

🐍 Philosophy & Scope

Why Python?

  • Rust vs. Python Rule:
    • If logic is deterministic, systems-level, or performance-critical: Do it in Rust (botserver)
    • If logic requires cutting-edge ML models, rapid experimentation with HuggingFace, or specific Python-only libraries: Do it here

Architecture Principles

  • Inference Only: This service should NOT hold business state. It accepts inputs, runs inference, and returns predictions.
  • Stateless: Treated as a sidecar to botserver.
  • API First: Exposes strict HTTP/REST endpoints consumed by botserver.

🛠 Technology Stack

  • Runtime: Python 3.10+
  • Web Framework: FastAPI (preferred over Flask for async/performance)
  • ML Frameworks: PyTorch, HuggingFace Transformers, Diffusers
  • Quality: ruff (linting), black (formatting), mypy (typing)

📡 API Endpoints

All endpoints require the X-API-Key header for authentication.

Image Generation

POST /api/image/generate
Content-Type: application/json
X-API-Key: your-api-key

{
  "prompt": "a cute cat playing with yarn",
  "steps": 30,
  "width": 512,
  "height": 512,
  "guidance_scale": 7.5,
  "seed": 42
}

Video Generation

POST /api/video/generate
Content-Type: application/json
X-API-Key: your-api-key

{
  "prompt": "a rocket launching into space",
  "num_frames": 24,
  "fps": 8,
  "steps": 50
}

Speech Generation (TTS)

POST /api/speech/generate
Content-Type: application/json
X-API-Key: your-api-key

{
  "prompt": "Hello, welcome to our service!",
  "voice": "default",
  "language": "en"
}

Speech to Text

POST /api/speech/totext
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <audio_file>

Image Description

POST /api/vision/describe
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <image_file>
prompt: "What is in this image?" (optional)

Video Description

POST /api/vision/describe_video
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <video_file>
num_frames: 8 (optional)

Visual Question Answering

POST /api/vision/vqa
Content-Type: multipart/form-data
X-API-Key: your-api-key

file: <image_file>
question: "How many people are in this image?"

Health Check

GET /api/health

Interactive API documentation:

  • Swagger UI: http://localhost:8085/api/docs
  • ReDoc: http://localhost:8085/api/redoc

🔗 Integration with BotServer

Configuration (config.csv)

key,value
botmodels-enabled,true
botmodels-host,0.0.0.0
botmodels-port,8085
botmodels-api-key,your-secret-key
botmodels-https,false
image-generator-model,../../../../data/diffusion/sd_turbo_f16.gguf
image-generator-steps,4
image-generator-width,512
image-generator-height,512
video-generator-model,../../../../data/diffusion/zeroscope_v2_576w
video-generator-frames,24
video-generator-fps,8

BASIC Script Keywords

// Generate an image
file = IMAGE "a beautiful sunset over mountains"
SEND FILE TO user, file

// Generate a video
video = VIDEO "waves crashing on a beach"
SEND FILE TO user, video

// Generate speech
audio = AUDIO "Welcome to General Bots!"
SEND FILE TO user, audio

// Get image/video description
caption = SEE "/path/to/image.jpg"
TALK caption

🏗️ Architecture

┌─────────────┐     HTTPS      ┌─────────────┐
│  botserver  │ ────────────▶  │  botmodels  │
│   (Rust)    │                │  (Python)   │
└─────────────┘                └─────────────┘
      │                              │
      │ BASIC Keywords               │ AI Models
      │ - IMAGE                      │ - Stable Diffusion
      │ - VIDEO                      │ - Zeroscope
      │ - AUDIO                      │ - TTS/Whisper
      │ - SEE                        │ - BLIP2
      ▼                              ▼
┌─────────────┐                ┌─────────────┐
│   config    │                │   outputs   │
│   .csv      │                │  (files)    │
└─────────────┘                └─────────────┘

Development Guidelines

Modern Model Usage

  • Deprecate Legacy: Move away from outdated libs (e.g., old allennlp) in favor of HuggingFace Transformers and Diffusers
  • Quantization: Always consider quantized models (bitsandbytes, GGUF) to reduce VRAM usage

Performance & Loading

  • Lazy Loading: Do NOT load 10GB models at module import time. Load on startup lifecycle or first request with locking
  • GPU Handling: Robustly detect CUDA/MPS (Mac) and fallback to CPU gracefully

Code Quality

  • Type Hints: All functions MUST have type hints
  • Error Handling: No bare except:. Catch precise exceptions and return structured JSON errors to botserver

Project Structure

botmodels/
├── src/
│   ├── api/
│   │   ├── v1/
│   │   │   └── endpoints/
│   │   │       ├── image.py
│   │   │       ├── video.py
│   │   │       ├── speech.py
│   │   │       └── vision.py
│   │   └── dependencies.py
│   ├── core/
│   │   ├── config.py
│   │   └── logging.py
│   ├── schemas/
│   │   └── generation.py
│   ├── services/
│   │   ├── image_service.py
│   │   ├── video_service.py
│   │   ├── speech_service.py
│   │   └── vision_service.py
│   └── main.py
├── outputs/
├── models/
├── tests/
├── requirements.txt
└── README.md

🧪 Testing

pytest tests/

🔒 Security

  1. Always use HTTPS in production
  2. Use strong, unique API keys
  3. Restrict network access to the service
  4. Consider running on a separate GPU server
  5. Monitor resource usage and set appropriate limits

📚 Documentation

For complete documentation, guides, and API references:


📦 Requirements

  • Python 3.10+
  • CUDA-capable GPU (recommended, 8GB+ VRAM)
  • 16GB+ RAM

🔗 Resources

Education

References

Community


🔑 Remember

  • Inference Only: No business state, just predictions
  • Modern Models: Use HuggingFace Transformers, Diffusers
  • Type Safety: All functions must have type hints
  • Lazy Loading: Don't load models at import time
  • GPU Detection: Graceful fallback to CPU
  • Version 1.0.0 - Do not change without approval
  • GIT WORKFLOW - ALWAYS push to ALL repositories (github, pragmatismo)

📄 License

See LICENSE file for details.