From 22a1954fac2f87a0a13b5e599771273172afc73a Mon Sep 17 00:00:00 2001 From: "Rodrigo Rodriguez (Pragmatismo)" Date: Wed, 4 Feb 2026 13:54:25 -0300 Subject: [PATCH] Update: delete PROMPT.md and update README.md --- PROMPT.md | 80 -------------------------------- README.md | 133 +++++++++++++++++++++++++++++++++++++++++------------- 2 files changed, 102 insertions(+), 111 deletions(-) delete mode 100644 PROMPT.md diff --git a/PROMPT.md b/PROMPT.md deleted file mode 100644 index 9706485..0000000 --- a/PROMPT.md +++ /dev/null @@ -1,80 +0,0 @@ -# General Bots Models (Python) - Project Guidelines - -**Version:** 1.0.0 -**Role:** AI Inference Service for BotServer -**Primary Directive:** Provide access to the latest open-source AI models (Python ecosystem) that are impractical to implement in Rust. - ---- - -## ๐Ÿ PHILOSOPHY & SCOPE - -### Why Python? -While `botserver` (Rust) handles the heavy lifting, networking, and business logic, `botmodels` exists solely to leverage the extensive **Python AI/ML ecosystem**. - -- **Rust vs. Python Rule**: - - If logic is deterministic, systems-level, or performance-critical logic: **Do it in Rust (botserver)**. - - If logic requires cutting-edge ML models, rapid experimentation with HuggingFace, or specific Python-only libraries: **Do it here**. - -### Architecture -- **Inference Only**: This service should NOT hold business state. It accepts inputs, runs inference, and returns predictions. -- **Stateless**: Treated as a sidecar to `botserver`. -- **API First**: Exposes strict HTTP/REST endpoints (or gRPC) consumed by `botserver`. - ---- - -## ๐Ÿ›  TECHNOLOGY STACK - -- **Runtime**: Python 3.10+ -- **Web Framework**: FastAPI (preferred over Flask for async/performance) or Flask (legacy support). -- **ML Frameworks**: PyTorch, HuggingFace Transformers, raw ONNX (if speed needed). -- **Quality**: `ruff` (linting), `black` (formatting), `mypy` (typing). - ---- - -## โšก๏ธ IMPERATIVES - -### 1. Modern Model Usage -- **Deprecate Legacy**: Move away from outdated libs (e.g., old `allennlp` if superseded) in favor of **HuggingFace Transformers** and **Diffusers**. -- **Quantization**: Always consider quantized models (bitsandbytes, GGUF) to reduce VRAM usage given the "consumer/prosumer" target of General Bots. - -### 2. Performance & Loading -- **Lazy Loading**: Do NOT load 10GB models at module import time. Load on startup lifecycle or first request with locking. -- **GPU Handling**: robustly detect CUDA/MPS (Mac) and fallback to CPU gracefully. - -### 3. Code Quality -- **Type Hints**: All functions MUST have type hints. -- **Error Handling**: No bare check `except:`. Catch precise exceptions and return structured JSON errors to `botserver`. - ---- - -## ๐Ÿ“ DEVELOPMENT WORKFLOW - -1. **Environment**: Always use a `venv`. - ```bash - python3 -m venv venv - source venv/bin/activate - pip install -r requirements.txt - ``` -2. **Running**: - ```bash - python app.py - # OR if migrated to FastAPI - uvicorn src.main:app --port 8089 --reload - ``` - ---- - -## ๐Ÿ”— INTEGRATION WITH BOTSERVER - -- **Port**: Defaults to `8089` (internal). -- **Security**: Must implement the shared secret handshake (HMAC/API Key) validated against `botserver`. -- **Keep-Alive**: `botserver` manages the lifecycle of this process. - ---- - -## โœ… CONTINUATION PROMPT - -When working in `botmodels`: -1. **Prioritize Ecosystem**: If a new SOTA model drops (e.g., Llama 3, Mistral), enable it here immediately. -2. **Optimize**: Ensure dependencies are minimized. Don't install `tensorflow` if `torch` suffices. -3. **Strict Typing**: Ensure all input/outputs match the `botserver` expectations perfectly. diff --git a/README.md b/README.md index 5b17b63..da75b3e 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,19 @@ -# BotModels +# BotModels - AI Inference Service -A multimodal AI service for General Bots providing image, video, audio generation, and vision/captioning capabilities. Works as a companion service to botserver, similar to how llama.cpp provides LLM capabilities. +**Version:** 1.0.0 +**Purpose:** Multimodal AI inference service for General Bots -![General Bots Models Services](https://raw.githubusercontent.com/GeneralBots/BotModels/master/BotModels.png) +--- + +## Overview + +BotModels is a Python-based AI inference service that provides multimodal capabilities to the General Bots platform. It serves as a companion to botserver (Rust), specializing in cutting-edge AI/ML models from the Python ecosystem including image generation, video creation, speech synthesis, and vision/captioning. + +While botserver handles business logic, networking, and systems-level operations, BotModels exists solely to leverage the extensive Python AI/ML ecosystem for inference tasks that are impractical to implement in Rust. + +For comprehensive documentation, see **[docs.pragmatismo.com.br](https://docs.pragmatismo.com.br)** or the **[BotBook](../botbook)** for detailed guides, API references, and tutorials. + +--- ## Features @@ -12,6 +23,8 @@ A multimodal AI service for General Bots providing image, video, audio generatio - **Speech Recognition**: Audio transcription using OpenAI Whisper - **Vision/Captioning**: Image and video description using BLIP2 +--- + ## Quick Start ### Installation @@ -63,7 +76,34 @@ python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --workers 4 python -m uvicorn src.main:app --host 0.0.0.0 --port 8085 --ssl-keyfile key.pem --ssl-certfile cert.pem ``` -## API Endpoints +--- + +## ๐Ÿ Philosophy & Scope + +### Why Python? + +- **Rust vs. Python Rule**: + - If logic is deterministic, systems-level, or performance-critical: **Do it in Rust (botserver)** + - If logic requires cutting-edge ML models, rapid experimentation with HuggingFace, or specific Python-only libraries: **Do it here** + +### Architecture Principles + +- **Inference Only**: This service should NOT hold business state. It accepts inputs, runs inference, and returns predictions. +- **Stateless**: Treated as a sidecar to `botserver`. +- **API First**: Exposes strict HTTP/REST endpoints consumed by `botserver`. + +--- + +## ๐Ÿ›  Technology Stack + +- **Runtime**: Python 3.10+ +- **Web Framework**: FastAPI (preferred over Flask for async/performance) +- **ML Frameworks**: PyTorch, HuggingFace Transformers, Diffusers +- **Quality**: `ruff` (linting), `black` (formatting), `mypy` (typing) + +--- + +## ๐Ÿ“ก API Endpoints All endpoints require the `X-API-Key` header for authentication. @@ -162,11 +202,15 @@ question: "How many people are in this image?" GET /api/health ``` -## Integration with botserver +Interactive API documentation: +- Swagger UI: `http://localhost:8085/api/docs` +- ReDoc: `http://localhost:8085/api/redoc` -BotModels integrates with botserver through HTTPS, providing multimodal capabilities to BASIC scripts. +--- -### botserver Configuration (config.csv) +## ๐Ÿ”— Integration with BotServer + +### Configuration (config.csv) ```csv key,value @@ -186,8 +230,6 @@ video-generator-fps,8 ### BASIC Script Keywords -Once configured, these keywords are available in BASIC: - ```basic // Generate an image file = IMAGE "a beautiful sunset over mountains" @@ -206,7 +248,9 @@ caption = SEE "/path/to/image.jpg" TALK caption ``` -## Architecture +--- + +## ๐Ÿ—๏ธ Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” HTTPS โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” @@ -226,29 +270,24 @@ TALK caption โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` -## Model Downloads +--- -Models are downloaded automatically on first use, or you can pre-download them: +## โšก๏ธ Development Guidelines -```bash -# Stable Diffusion -python -c "from diffusers import StableDiffusionPipeline; StableDiffusionPipeline.from_pretrained('runwayml/stable-diffusion-v1-5')" +### Modern Model Usage -# BLIP2 (Vision) -python -c "from transformers import Blip2Processor, Blip2ForConditionalGeneration; Blip2Processor.from_pretrained('Salesforce/blip2-opt-2.7b'); Blip2ForConditionalGeneration.from_pretrained('Salesforce/blip2-opt-2.7b')" +- **Deprecate Legacy**: Move away from outdated libs (e.g., old `allennlp`) in favor of **HuggingFace Transformers** and **Diffusers** +- **Quantization**: Always consider quantized models (bitsandbytes, GGUF) to reduce VRAM usage -# Whisper (Speech-to-Text) -python -c "import whisper; whisper.load_model('base')" -``` +### Performance & Loading -## API Documentation +- **Lazy Loading**: Do NOT load 10GB models at module import time. Load on startup lifecycle or first request with locking +- **GPU Handling**: Robustly detect CUDA/MPS (Mac) and fallback to CPU gracefully -Interactive API documentation is available at: +### Code Quality -- Swagger UI: `http://localhost:8085/api/docs` -- ReDoc: `http://localhost:8085/api/redoc` - -## Development +- **Type Hints**: All functions MUST have type hints +- **Error Handling**: No bare `except:`. Catch precise exceptions and return structured JSON errors to `botserver` ### Project Structure @@ -281,13 +320,17 @@ botmodels/ โ””โ”€โ”€ README.md ``` -### Running Tests +--- + +## ๐Ÿงช Testing ```bash pytest tests/ ``` -## Security Notes +--- + +## ๐Ÿ”’ Security 1. **Always use HTTPS in production** 2. Use strong, unique API keys @@ -295,13 +338,27 @@ pytest tests/ 4. Consider running on a separate GPU server 5. Monitor resource usage and set appropriate limits -## Requirements +--- + +## ๐Ÿ“š Documentation + +For complete documentation, guides, and API references: + +- **[docs.pragmatismo.com.br](https://docs.pragmatismo.com.br)** - Full online documentation +- **[BotBook](../botbook)** - Local comprehensive guide with tutorials and examples +- **[General Bots Repository](https://github.com/GeneralBots/BotServer)** - Main project repository + +--- + +## ๐Ÿ“ฆ Requirements - Python 3.10+ - CUDA-capable GPU (recommended, 8GB+ VRAM) - 16GB+ RAM -## Resources +--- + +## ๐Ÿ”— Resources ### Education @@ -321,6 +378,20 @@ pytest tests/ - [AI for Mankind](https://github.com/aiformankind) - [ManaAI](https://manaai.cn/) -## License +--- + +## ๐Ÿ”‘ Remember + +- **Inference Only**: No business state, just predictions +- **Modern Models**: Use HuggingFace Transformers, Diffusers +- **Type Safety**: All functions must have type hints +- **Lazy Loading**: Don't load models at import time +- **GPU Detection**: Graceful fallback to CPU +- **Version 1.0.0** - Do not change without approval +- **GIT WORKFLOW** - ALWAYS push to ALL repositories (github, pragmatismo) + +--- + +## ๐Ÿ“„ License See LICENSE file for details. \ No newline at end of file