From 462a6dfa51b12f22e87712e613a559f66f9013cb Mon Sep 17 00:00:00 2001
From: "Rodrigo Rodriguez (Pragmatismo)" <me@rodrigorodriguez.com>
Date: Sun, 25 Jan 2026 08:42:39 -0300
Subject: [PATCH] docs: add PROMPT.md standards

---
 PROMPT.md | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)
 create mode 100644 PROMPT.md

diff --git a/PROMPT.md b/PROMPT.md
new file mode 100644
index 0000000..9706485
--- /dev/null
+++ b/PROMPT.md
@@ -0,0 +1,80 @@
+# General Bots Models (Python) - Project Guidelines
+
+**Version:** 1.0.0
+**Role:** AI Inference Service for BotServer
+**Primary Directive:** Provide access to the latest open-source AI models (Python ecosystem) that are impractical to implement in Rust.
+
+---
+
+## 🐍 PHILOSOPHY & SCOPE
+
+### Why Python?
+While `botserver` (Rust) handles the heavy lifting, networking, and business logic, `botmodels` exists solely to leverage the extensive **Python AI/ML ecosystem**.
+
+- **Rust vs. Python Rule**:
+  - If logic is deterministic, systems-level, or performance-critical logic: **Do it in Rust (botserver)**.
+  - If logic requires cutting-edge ML models, rapid experimentation with HuggingFace, or specific Python-only libraries: **Do it here**.
+
+### Architecture
+- **Inference Only**: This service should NOT hold business state. It accepts inputs, runs inference, and returns predictions.
+- **Stateless**: Treated as a sidecar to `botserver`.
+- **API First**: Exposes strict HTTP/REST endpoints (or gRPC) consumed by `botserver`.
+
+---
+
+## 🛠 TECHNOLOGY STACK
+
+- **Runtime**: Python 3.10+
+- **Web Framework**: FastAPI (preferred over Flask for async/performance) or Flask (legacy support).
+- **ML Frameworks**: PyTorch, HuggingFace Transformers, raw ONNX (if speed needed).
+- **Quality**: `ruff` (linting), `black` (formatting), `mypy` (typing).
+
+---
+
+## ⚡️ IMPERATIVES
+
+### 1. Modern Model Usage
+- **Deprecate Legacy**: Move away from outdated libs (e.g., old `allennlp` if superseded) in favor of **HuggingFace Transformers** and **Diffusers**.
+- **Quantization**: Always consider quantized models (bitsandbytes, GGUF) to reduce VRAM usage given the "consumer/prosumer" target of General Bots.
+
+### 2. Performance & Loading
+- **Lazy Loading**: Do NOT load 10GB models at module import time. Load on startup lifecycle or first request with locking.
+- **GPU Handling**: robustly detect CUDA/MPS (Mac) and fallback to CPU gracefully.
+
+### 3. Code Quality
+- **Type Hints**: All functions MUST have type hints.
+- **Error Handling**: No bare check `except:`. Catch precise exceptions and return structured JSON errors to `botserver`.
+
+---
+
+## 📝 DEVELOPMENT WORKFLOW
+
+1.  **Environment**: Always use a `venv`.
+    ```bash
+    python3 -m venv venv
+    source venv/bin/activate
+    pip install -r requirements.txt
+    ```
+2.  **Running**:
+    ```bash
+    python app.py
+    # OR if migrated to FastAPI
+    uvicorn src.main:app --port 8089 --reload
+    ```
+
+---
+
+## 🔗 INTEGRATION WITH BOTSERVER
+
+- **Port**: Defaults to `8089` (internal).
+- **Security**: Must implement the shared secret handshake (HMAC/API Key) validated against `botserver`.
+- **Keep-Alive**: `botserver` manages the lifecycle of this process.
+
+---
+
+## ✅ CONTINUATION PROMPT
+
+When working in `botmodels`:
+1.  **Prioritize Ecosystem**: If a new SOTA model drops (e.g., Llama 3, Mistral), enable it here immediately.
+2.  **Optimize**: Ensure dependencies are minimized. Don't install `tensorflow` if `torch` suffices.
+3.  **Strict Typing**: Ensure all input/outputs match the `botserver` expectations perfectly.