Fix Bedrock config for OpenAI GPT-OSS models
All checks were successful
GBCI / build (push) Successful in 11s

This commit is contained in:
Rodrigo Rodriguez (Pragmatismo) 2026-03-10 12:37:35 -03:00
parent 60fe68a693
commit c312a30461

View file

@ -68,32 +68,43 @@ llm-model,claude-sonnet-4.5
- Premium pricing - Premium pricing
- Newer provider, smaller ecosystem - Newer provider, smaller ecosystem
### Google (Gemini Series) ### Google (Gemini & Vertex AI)
Google's multimodal AI models with strong reasoning capabilities. Google's multimodal AI models with strong reasoning capabilities. General Bots natively supports both the public AI Studio API and Enterprise Vertex AI.
| Model | Context | Best For | Speed | | Model | Context | Best For | Speed |
|-------|---------|----------|-------| |-------|---------|----------|-------|
| Gemini Pro | 2M | Complex reasoning, benchmarks | Medium | | Gemini 1.5 Pro | 2M | Complex reasoning, benchmarks | Medium |
| Gemini Flash | 1M | Fast multimodal tasks | Fast | | Gemini 1.5 Flash | 1M | Fast multimodal tasks | Fast |
**Configuration (config.csv):** **Configuration for AI Studio (Public API):**
```csv ```csv
name,value name,value
llm-provider,google llm-provider,google
llm-model,gemini-pro llm-model,gemini-1.5-pro
llm-url,https://generativelanguage.googleapis.com
llm-key,AIza...
``` ```
**Configuration for Vertex AI (Enterprise):**
```csv
name,value
llm-provider,vertex
llm-model,gemini-1.5-pro
llm-url,https://us-central1-aiplatform.googleapis.com
llm-key,~/.vertex.json
```
*Note: The bots will handle the Google OAuth2 JWT authentication internally if you provide the path or the raw JSON to a Service Account.*
**Strengths:** **Strengths:**
- Largest context window (2M tokens) - Largest context window (2M tokens)
- Native multimodal (text, image, video, audio) - Native multimodal (text, image, video, audio)
- Strong at structured data - Vertex AI support enables enterprise VPC/IAM integration
- Good coding abilities
**Considerations:** **Considerations:**
- Some features region-limited - Different endpoints for public vs enterprise deployments
- API changes more frequently
### xAI (Grok Series) ### xAI (Grok Series)
@ -203,6 +214,98 @@ llm-server-url,https://api.deepseek.com
- Data processed in China - Data processed in China
- Newer provider - Newer provider
### Amazon Bedrock
AWS managed service for foundation models, supporting Claude, Llama, Titan, and others.
| Model | Context | Best For | Speed |
|-------|---------|----------|-------|
| Claude 3.5 Sonnet | 200K | High capability tasks | Fast |
| Llama 3.1 70B | 128K | Open-weight performance | Fast |
**Configuration (config.csv):**
```csv
name,value
llm-provider,bedrock
llm-model,anthropic.claude-3-5-sonnet-20240620-v1:0
llm-url,https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-5-sonnet-20240620-v1:0/invoke
llm-key,YOUR_BEDROCK_API_KEY
```
**Strengths:**
- Native AWS integration
- Enterprise-grade security
- Multiple model families in one API
### Azure OpenAI
Enterprise-grade deployment of OpenAI models hosted on Microsoft Azure.
| Model | Context | Best For | Speed |
|-------|---------|----------|-------|
| GPT-4o | 128K | Advanced multimodal | Fast |
**Configuration (config.csv):**
```csv
name,value
llm-provider,azureclaude
llm-model,gpt-4o
llm-url,https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT/chat/completions?api-version=2024-02-15-preview
llm-key,YOUR_AZURE_API_KEY
```
**Strengths:**
- High enterprise compliance (HIPAA, SOC2)
- Azure VNet integration
- Guaranteed provisioned throughput available
### Cerebras
Ultra-fast inference powered by Wafer-Scale Engine hardware, specifically tuned for open-source models like Llama.
| Model | Context | Best For | Speed |
|-------|---------|----------|-------|
| Llama 3.1 70B | 8K | High-speed general tasks | Extremely Fast |
**Configuration (config.csv):**
```csv
name,value
llm-provider,cerebras
llm-model,llama3.1-8b
llm-url,https://api.cerebras.ai/v1/chat/completions
llm-key,YOUR_CEREBRAS_API_KEY
```
**Strengths:**
- Highest tokens-per-second available
- Excellent for real-time agent loops
### Zhipu AI (GLM)
High-capability bilingual models (English/Chinese) directly competing with state-of-the-art global models.
| Model | Context | Best For | Speed |
|-------|---------|----------|-------|
| GLM-4 | 128K | General purpose | Medium |
| GLM-4-Long | 1M | Long document analysis | Medium |
**Configuration (config.csv):**
```csv
name,value
llm-provider,glm
llm-model,glm-4
llm-url,https://open.bigmodel.cn/api/paas/v4/chat/completions
llm-key,YOUR_ZHIPU_API_KEY
```
**Strengths:**
- Excellent bilingual performance
- Large context windows (up to 1M)
## Local Models ## Local Models
Run models on your own hardware for privacy, cost control, and offline operation. Run models on your own hardware for privacy, cost control, and offline operation.
@ -456,8 +559,12 @@ llm-log-timing,true
|-------|---------|------|-----------| |-------|---------|------|-----------|
| GPT-5 | OpenAI | Proprietary | Most advanced all-in-one | | GPT-5 | OpenAI | Proprietary | Most advanced all-in-one |
| Claude Opus/Sonnet 4.5 | Anthropic | Proprietary | Extended thinking, complex reasoning | | Claude Opus/Sonnet 4.5 | Anthropic | Proprietary | Extended thinking, complex reasoning |
| Gemini 3 Pro | Google | Proprietary | Benchmarks, reasoning | | Gemini 1.5/3 Pro | Google | Proprietary | Benchmarks, reasoning, 2M context |
| Grok 4 | xAI | Proprietary | Real-time X data | | Grok 4 | xAI | Proprietary | Real-time X data |
| Claude / Llama | Amazon Bedrock | Managed API | Enterprise AWS integration |
| GPT-4o / GPT-5 | Azure OpenAI | Managed API | Enterprise compliance, Azure VNet |
| Llama / Open Models | Cerebras | Hardware Cloud | Extreme inference speed |
| GLM-4 | Zhipu AI | Proprietary | English/Chinese bilingual, up to 1M context |
| DeepSeek-V3.1/R1 | DeepSeek | Open (MIT/Apache) | Cost-optimized, reasoning | | DeepSeek-V3.1/R1 | DeepSeek | Open (MIT/Apache) | Cost-optimized, reasoning |
| Llama 4 | Meta | Open-weight | 10M context, multimodal | | Llama 4 | Meta | Open-weight | 10M context, multimodal |
| Qwen3 | Alibaba | Open (Apache) | Efficient MoE | | Qwen3 | Alibaba | Open (Apache) | Efficient MoE |