docs: add PROMPT.md standards
This commit is contained in:
parent
216e34ec13
commit
462a6dfa51
1 changed files with 80 additions and 0 deletions
80
PROMPT.md
Normal file
80
PROMPT.md
Normal file
|
|
@ -0,0 +1,80 @@
|
|||
# General Bots Models (Python) - Project Guidelines
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Role:** AI Inference Service for BotServer
|
||||
**Primary Directive:** Provide access to the latest open-source AI models (Python ecosystem) that are impractical to implement in Rust.
|
||||
|
||||
---
|
||||
|
||||
## 🐍 PHILOSOPHY & SCOPE
|
||||
|
||||
### Why Python?
|
||||
While `botserver` (Rust) handles the heavy lifting, networking, and business logic, `botmodels` exists solely to leverage the extensive **Python AI/ML ecosystem**.
|
||||
|
||||
- **Rust vs. Python Rule**:
|
||||
- If logic is deterministic, systems-level, or performance-critical logic: **Do it in Rust (botserver)**.
|
||||
- If logic requires cutting-edge ML models, rapid experimentation with HuggingFace, or specific Python-only libraries: **Do it here**.
|
||||
|
||||
### Architecture
|
||||
- **Inference Only**: This service should NOT hold business state. It accepts inputs, runs inference, and returns predictions.
|
||||
- **Stateless**: Treated as a sidecar to `botserver`.
|
||||
- **API First**: Exposes strict HTTP/REST endpoints (or gRPC) consumed by `botserver`.
|
||||
|
||||
---
|
||||
|
||||
## 🛠 TECHNOLOGY STACK
|
||||
|
||||
- **Runtime**: Python 3.10+
|
||||
- **Web Framework**: FastAPI (preferred over Flask for async/performance) or Flask (legacy support).
|
||||
- **ML Frameworks**: PyTorch, HuggingFace Transformers, raw ONNX (if speed needed).
|
||||
- **Quality**: `ruff` (linting), `black` (formatting), `mypy` (typing).
|
||||
|
||||
---
|
||||
|
||||
## ⚡️ IMPERATIVES
|
||||
|
||||
### 1. Modern Model Usage
|
||||
- **Deprecate Legacy**: Move away from outdated libs (e.g., old `allennlp` if superseded) in favor of **HuggingFace Transformers** and **Diffusers**.
|
||||
- **Quantization**: Always consider quantized models (bitsandbytes, GGUF) to reduce VRAM usage given the "consumer/prosumer" target of General Bots.
|
||||
|
||||
### 2. Performance & Loading
|
||||
- **Lazy Loading**: Do NOT load 10GB models at module import time. Load on startup lifecycle or first request with locking.
|
||||
- **GPU Handling**: robustly detect CUDA/MPS (Mac) and fallback to CPU gracefully.
|
||||
|
||||
### 3. Code Quality
|
||||
- **Type Hints**: All functions MUST have type hints.
|
||||
- **Error Handling**: No bare check `except:`. Catch precise exceptions and return structured JSON errors to `botserver`.
|
||||
|
||||
---
|
||||
|
||||
## 📝 DEVELOPMENT WORKFLOW
|
||||
|
||||
1. **Environment**: Always use a `venv`.
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
2. **Running**:
|
||||
```bash
|
||||
python app.py
|
||||
# OR if migrated to FastAPI
|
||||
uvicorn src.main:app --port 8089 --reload
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 INTEGRATION WITH BOTSERVER
|
||||
|
||||
- **Port**: Defaults to `8089` (internal).
|
||||
- **Security**: Must implement the shared secret handshake (HMAC/API Key) validated against `botserver`.
|
||||
- **Keep-Alive**: `botserver` manages the lifecycle of this process.
|
||||
|
||||
---
|
||||
|
||||
## ✅ CONTINUATION PROMPT
|
||||
|
||||
When working in `botmodels`:
|
||||
1. **Prioritize Ecosystem**: If a new SOTA model drops (e.g., Llama 3, Mistral), enable it here immediately.
|
||||
2. **Optimize**: Ensure dependencies are minimized. Don't install `tensorflow` if `torch` suffices.
|
||||
3. **Strict Typing**: Ensure all input/outputs match the `botserver` expectations perfectly.
|
||||
Loading…
Add table
Reference in a new issue