Fix Bedrock config for OpenAI GPT-OSS models

2026-03-10 12:37:35 -03:00 · 2026-03-10 12:37:35 -03:00 · c312a30461
commit c312a30461
parent 60fe68a693
1 changed files with 118 additions and 11 deletions
--- a/src/18-appendix-external-services/llm-providers.md
+++ b/src/18-appendix-external-services/llm-providers.md
@ -68,32 +68,43 @@ llm-model,claude-sonnet-4.5
 - Premium pricing
 - Newer provider, smaller ecosystem
-### Google (Gemini Series)
+### Google (Gemini & Vertex AI)
-Google's multimodal AI models with strong reasoning capabilities.
+Google's multimodal AI models with strong reasoning capabilities. General Bots natively supports both the public AI Studio API and Enterprise Vertex AI.
 | Model | Context | Best For | Speed |
 |-------|---------|----------|-------|
-| Gemini Pro | 2M | Complex reasoning, benchmarks | Medium |
+| Gemini 1.5 Pro | 2M | Complex reasoning, benchmarks | Medium |
-| Gemini Flash | 1M | Fast multimodal tasks | Fast |
+| Gemini 1.5 Flash | 1M | Fast multimodal tasks | Fast |
-**Configuration (config.csv):**
+**Configuration for AI Studio (Public API):**
 ```csv
 name,value
 llm-provider,google
-llm-model,gemini-pro
+llm-model,gemini-1.5-pro
 llm-url,https://generativelanguage.googleapis.com
 llm-key,AIza...
 ```
 **Configuration for Vertex AI (Enterprise):**
 ```csv
 name,value
 llm-provider,vertex
 llm-model,gemini-1.5-pro
 llm-url,https://us-central1-aiplatform.googleapis.com
 llm-key,~/.vertex.json
 ```
 *Note: The bots will handle the Google OAuth2 JWT authentication internally if you provide the path or the raw JSON to a Service Account.*
 **Strengths:**
 - Largest context window (2M tokens)
 - Native multimodal (text, image, video, audio)
- Strong at structured data
+- Vertex AI support enables enterprise VPC/IAM integration
 - Good coding abilities
 **Considerations:**
- Some features region-limited
+- Different endpoints for public vs enterprise deployments
 - API changes more frequently
 ### xAI (Grok Series)
@ -203,6 +214,98 @@ llm-server-url,https://api.deepseek.com
 - Data processed in China
 - Newer provider
 ### Amazon Bedrock
 AWS managed service for foundation models, supporting Claude, Llama, Titan, and others.
 | Model | Context | Best For | Speed |
 |-------|---------|----------|-------|
 | Claude 3.5 Sonnet | 200K | High capability tasks | Fast |
 | Llama 3.1 70B | 128K | Open-weight performance | Fast |
 **Configuration (config.csv):**
 ```csv
 name,value
 llm-provider,bedrock
 llm-model,anthropic.claude-3-5-sonnet-20240620-v1:0
 llm-url,https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-5-sonnet-20240620-v1:0/invoke
 llm-key,YOUR_BEDROCK_API_KEY
 ```
 **Strengths:**
 - Native AWS integration
 - Enterprise-grade security
 - Multiple model families in one API
 ### Azure OpenAI
 Enterprise-grade deployment of OpenAI models hosted on Microsoft Azure.
 | Model | Context | Best For | Speed |
 |-------|---------|----------|-------|
 | GPT-4o | 128K | Advanced multimodal | Fast |
 **Configuration (config.csv):**
 ```csv
 name,value
 llm-provider,azureclaude
 llm-model,gpt-4o
 llm-url,https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT/chat/completions?api-version=2024-02-15-preview
 llm-key,YOUR_AZURE_API_KEY
 ```
 **Strengths:**
 - High enterprise compliance (HIPAA, SOC2)
 - Azure VNet integration
 - Guaranteed provisioned throughput available
 ### Cerebras
 Ultra-fast inference powered by Wafer-Scale Engine hardware, specifically tuned for open-source models like Llama.
 | Model | Context | Best For | Speed |
 |-------|---------|----------|-------|
 | Llama 3.1 70B | 8K | High-speed general tasks | Extremely Fast |
 **Configuration (config.csv):**
 ```csv
 name,value
 llm-provider,cerebras
 llm-model,llama3.1-8b
 llm-url,https://api.cerebras.ai/v1/chat/completions
 llm-key,YOUR_CEREBRAS_API_KEY
 ```
 **Strengths:**
 - Highest tokens-per-second available
 - Excellent for real-time agent loops
 ### Zhipu AI (GLM)
 High-capability bilingual models (English/Chinese) directly competing with state-of-the-art global models.
 | Model | Context | Best For | Speed |
 |-------|---------|----------|-------|
 | GLM-4 | 128K | General purpose | Medium |
 | GLM-4-Long | 1M | Long document analysis | Medium |
 **Configuration (config.csv):**
 ```csv
 name,value
 llm-provider,glm
 llm-model,glm-4
 llm-url,https://open.bigmodel.cn/api/paas/v4/chat/completions
 llm-key,YOUR_ZHIPU_API_KEY
 ```
 **Strengths:**
 - Excellent bilingual performance
 - Large context windows (up to 1M)
 ## Local Models
 Run models on your own hardware for privacy, cost control, and offline operation.
@ -456,8 +559,12 @@ llm-log-timing,true
 |-------|---------|------|-----------|
 | GPT-5 | OpenAI | Proprietary | Most advanced all-in-one |
 | Claude Opus/Sonnet 4.5 | Anthropic | Proprietary | Extended thinking, complex reasoning |
-| Gemini 3 Pro | Google | Proprietary | Benchmarks, reasoning |
+| Gemini 1.5/3 Pro | Google | Proprietary | Benchmarks, reasoning, 2M context |
 | Grok 4 | xAI | Proprietary | Real-time X data |
 | Claude / Llama | Amazon Bedrock | Managed API | Enterprise AWS integration |
 | GPT-4o / GPT-5 | Azure OpenAI | Managed API | Enterprise compliance, Azure VNet |
 | Llama / Open Models | Cerebras | Hardware Cloud | Extreme inference speed |
 | GLM-4 | Zhipu AI | Proprietary | English/Chinese bilingual, up to 1M context |
 | DeepSeek-V3.1/R1 | DeepSeek | Open (MIT/Apache) | Cost-optimized, reasoning |
 | Llama 4 | Meta | Open-weight | 10M context, multimodal |
 | Qwen3 | Alibaba | Open (Apache) | Efficient MoE |