Fix Bedrock config for OpenAI GPT-OSS models

2026-03-10 12:37:35 -03:00 · 2026-03-10 12:37:35 -03:00 · c312a30461
commit c312a30461
parent 60fe68a693
1 changed files with 118 additions and 11 deletions
--- a/src/18-appendix-external-services/llm-providers.md
+++ b/src/18-appendix-external-services/llm-providers.md
@ -68,32 +68,43 @@ llm-model,claude-sonnet-4.5
 - Premium pricing
 - Newer provider, smaller ecosystem

-### Google (Gemini Series)
+### Google (Gemini & Vertex AI)

-Google's multimodal AI models with strong reasoning capabilities.
+Google's multimodal AI models with strong reasoning capabilities. General Bots natively supports both the public AI Studio API and Enterprise Vertex AI.

 | Model | Context | Best For | Speed |
 |-------|---------|----------|-------|
-| Gemini Pro | 2M | Complex reasoning, benchmarks | Medium |
-| Gemini Flash | 1M | Fast multimodal tasks | Fast |
+| Gemini 1.5 Pro | 2M | Complex reasoning, benchmarks | Medium |
+| Gemini 1.5 Flash | 1M | Fast multimodal tasks | Fast |

-**Configuration (config.csv):**
+**Configuration for AI Studio (Public API):**

 ```csv
 name,value
 llm-provider,google
-llm-model,gemini-pro
+llm-model,gemini-1.5-pro
+llm-url,https://generativelanguage.googleapis.com
+llm-key,AIza...
 ```

+**Configuration for Vertex AI (Enterprise):**
+
+```csv
+name,value
+llm-provider,vertex
+llm-model,gemini-1.5-pro
+llm-url,https://us-central1-aiplatform.googleapis.com
+llm-key,~/.vertex.json
+```
+*Note: The bots will handle the Google OAuth2 JWT authentication internally if you provide the path or the raw JSON to a Service Account.*
+
 **Strengths:**
 - Largest context window (2M tokens)
 - Native multimodal (text, image, video, audio)
- Strong at structured data
- Good coding abilities
+- Vertex AI support enables enterprise VPC/IAM integration

 **Considerations:**
- Some features region-limited
- API changes more frequently
+- Different endpoints for public vs enterprise deployments

 ### xAI (Grok Series)

@ -203,6 +214,98 @@ llm-server-url,https://api.deepseek.com
 - Data processed in China
 - Newer provider

+### Amazon Bedrock
+
+AWS managed service for foundation models, supporting Claude, Llama, Titan, and others.
+
+| Model | Context | Best For | Speed |
+|-------|---------|----------|-------|
+| Claude 3.5 Sonnet | 200K | High capability tasks | Fast |
+| Llama 3.1 70B | 128K | Open-weight performance | Fast |
+
+**Configuration (config.csv):**
+
+```csv
+name,value
+llm-provider,bedrock
+llm-model,anthropic.claude-3-5-sonnet-20240620-v1:0
+llm-url,https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-5-sonnet-20240620-v1:0/invoke
+llm-key,YOUR_BEDROCK_API_KEY
+```
+
+**Strengths:**
+- Native AWS integration
+- Enterprise-grade security
+- Multiple model families in one API
+
+### Azure OpenAI
+
+Enterprise-grade deployment of OpenAI models hosted on Microsoft Azure.
+
+| Model | Context | Best For | Speed |
+|-------|---------|----------|-------|
+| GPT-4o | 128K | Advanced multimodal | Fast |
+
+**Configuration (config.csv):**
+
+```csv
+name,value
+llm-provider,azureclaude
+llm-model,gpt-4o
+llm-url,https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT/chat/completions?api-version=2024-02-15-preview
+llm-key,YOUR_AZURE_API_KEY
+```
+
+**Strengths:**
+- High enterprise compliance (HIPAA, SOC2)
+- Azure VNet integration
+- Guaranteed provisioned throughput available
+
+### Cerebras
+
+Ultra-fast inference powered by Wafer-Scale Engine hardware, specifically tuned for open-source models like Llama.
+
+| Model | Context | Best For | Speed |
+|-------|---------|----------|-------|
+| Llama 3.1 70B | 8K | High-speed general tasks | Extremely Fast |
+
+**Configuration (config.csv):**
+
+```csv
+name,value
+llm-provider,cerebras
+llm-model,llama3.1-8b
+llm-url,https://api.cerebras.ai/v1/chat/completions
+llm-key,YOUR_CEREBRAS_API_KEY
+```
+
+**Strengths:**
+- Highest tokens-per-second available
+- Excellent for real-time agent loops
+
+### Zhipu AI (GLM)
+
+High-capability bilingual models (English/Chinese) directly competing with state-of-the-art global models.
+
+| Model | Context | Best For | Speed |
+|-------|---------|----------|-------|
+| GLM-4 | 128K | General purpose | Medium |
+| GLM-4-Long | 1M | Long document analysis | Medium |
+
+**Configuration (config.csv):**
+
+```csv
+name,value
+llm-provider,glm
+llm-model,glm-4
+llm-url,https://open.bigmodel.cn/api/paas/v4/chat/completions
+llm-key,YOUR_ZHIPU_API_KEY
+```
+
+**Strengths:**
+- Excellent bilingual performance
+- Large context windows (up to 1M)
+
 ## Local Models

 Run models on your own hardware for privacy, cost control, and offline operation.
@ -456,8 +559,12 @@ llm-log-timing,true
 |-------|---------|------|-----------|
 | GPT-5 | OpenAI | Proprietary | Most advanced all-in-one |
 | Claude Opus/Sonnet 4.5 | Anthropic | Proprietary | Extended thinking, complex reasoning |
-| Gemini 3 Pro | Google | Proprietary | Benchmarks, reasoning |
+| Gemini 1.5/3 Pro | Google | Proprietary | Benchmarks, reasoning, 2M context |
 | Grok 4 | xAI | Proprietary | Real-time X data |
+| Claude / Llama | Amazon Bedrock | Managed API | Enterprise AWS integration |
+| GPT-4o / GPT-5 | Azure OpenAI | Managed API | Enterprise compliance, Azure VNet |
+| Llama / Open Models | Cerebras | Hardware Cloud | Extreme inference speed |
+| GLM-4 | Zhipu AI | Proprietary | English/Chinese bilingual, up to 1M context |
 | DeepSeek-V3.1/R1 | DeepSeek | Open (MIT/Apache) | Cost-optimized, reasoning |
 | Llama 4 | Meta | Open-weight | 10M context, multimodal |
 | Qwen3 | Alibaba | Open (Apache) | Efficient MoE |