botserver/docs/src/chapter-08/get-integration.md

238 lines
5.6 KiB
Markdown
Raw Normal View History

# GET Keyword Integration
2025-11-23 09:19:06 -03:00
The `GET` keyword in BotServer provides file retrieval capabilities from both local filesystem and MinIO/S3 storage, enabling tools to access documents, data files, and other resources.
## Overview
The `GET` keyword is a fundamental BASIC command that retrieves file contents as strings. It supports:
- Local file system access (with safety checks)
- MinIO/S3 bucket retrieval
- URL fetching (HTTP/HTTPS)
- Integration with knowledge base documents
## Basic Usage
```basic
# Get a file from the bot's bucket
let content = GET "documents/policy.pdf"
# Get a file with full path
let data = GET "announcements.gbkb/news/news.pdf"
# Get from URL
let webpage = GET "https://example.com/data.json"
```
## Implementation Details
### File Path Resolution
The GET keyword determines the source based on the path format:
1. **URL Detection**: Paths starting with `http://` or `https://`
2. **MinIO/S3 Storage**: All other paths (retrieved from bot's bucket)
3. **Safety Validation**: Paths are checked for directory traversal attempts
### MinIO/S3 Integration
When retrieving from MinIO:
```basic
# Retrieves from: {bot-name}.gbai bucket
let doc = GET "knowledge/document.txt"
# Full path within bucket
let report = GET "reports/2024/quarterly.pdf"
```
The implementation:
1. Connects to MinIO using configured credentials
2. Retrieves from the bot's dedicated bucket
3. Returns file contents as string
4. Handles binary files by converting to text
### URL Fetching
For external resources:
```basic
let api_data = GET "https://api.example.com/data"
let webpage = GET "http://example.com/page.html"
```
URL fetching includes:
- HTTP/HTTPS support
- Automatic redirect following
- Timeout protection (30 seconds)
- Error handling for failed requests
## Safety Features
### Path Validation
The `is_safe_path` function prevents directory traversal:
- Blocks paths containing `..`
- Prevents absolute paths
- Validates character sets
- Ensures sandbox isolation
### Access Control
- Files limited to bot's own bucket
- Cannot access other bots' data
- System directories protected
- Credentials never exposed
## Error Handling
GET operations handle various error conditions:
```basic
let content = GET "missing-file.txt"
# Returns empty string if file not found
if (content == "") {
TALK "File not found or empty"
}
```
Common errors:
- File not found: Returns empty string
- Access denied: Returns error message
- Network timeout: Returns timeout error
- Invalid path: Returns security error
## Use Cases
### Loading Knowledge Base Documents
```basic
# In update-summary.bas
let text = GET "announcements.gbkb/news/news.pdf"
let summary = LLM "Summarize this: " + text
SET_BOT_MEMORY "news_summary", summary
```
### Reading Configuration Files
```basic
let config = GET "settings.json"
# Parse and use configuration
```
### Fetching External Data
```basic
let weather_data = GET "https://api.weather.com/current"
# Process weather information
```
### Loading Templates
```basic
let template = GET "templates/email-template.html"
let filled = REPLACE(template, "{{name}}", customer_name)
```
## Performance Considerations
### Caching
- GET results are not cached by default
- Frequent reads should use BOT_MEMORY for caching
- Large files impact memory usage
### Timeouts
- URL fetches: 30-second timeout
- MinIO operations: Network-dependent
- Local files: Immediate (if accessible)
### File Size Limits
- No hard limit enforced
- Large files consume memory
- Binary files converted to text (may be large)
## Integration with Tools
### Tool Parameters from Files
```basic
PARAM config_file AS string LIKE "config.json" DESCRIPTION "Configuration file path"
let config = GET config_file
# Use configuration in tool logic
```
### Dynamic Resource Loading
```basic
DESCRIPTION "Process documents from a folder"
let file_list = GET "documents/index.txt"
let files = SPLIT(file_list, "\n")
FOR EACH file IN files {
let content = GET "documents/" + file
# Process each document
}
```
## Best Practices
1. **Check for Empty Results**: Always verify GET returned content
2. **Use Relative Paths**: Avoid hardcoding absolute paths
3. **Handle Binary Files Carefully**: Text conversion may be lossy
4. **Cache Frequently Used Files**: Store in BOT_MEMORY
5. **Validate External URLs**: Ensure HTTPS for sensitive data
6. **Log Access Failures**: Track missing or inaccessible files
## Limitations
- Cannot write files (read-only operation)
- Binary files converted to text (may corrupt data)
- No streaming support (entire file loaded to memory)
- Path traversal blocked for security
- Cannot access system directories
## Examples
### Document Summarization Tool
```basic
PARAM doc_path AS string LIKE "reports/annual.pdf" DESCRIPTION "Document to summarize"
DESCRIPTION "Summarizes a document"
let content = GET doc_path
if (content == "") {
TALK "Document not found: " + doc_path
} else {
let summary = LLM "Create a brief summary: " + content
TALK summary
}
```
### Data Processing Tool
```basic
PARAM data_file AS string LIKE "data/sales.csv" DESCRIPTION "Data file to process"
DESCRIPTION "Analyzes sales data"
let csv_data = GET data_file
let analysis = LLM "Analyze this sales data: " + csv_data
TALK analysis
```
## Security Considerations
- Never GET files with user-controlled paths directly
- Validate and sanitize path inputs
- Use allowlists for acceptable file paths
- Log all file access attempts
- Monitor for unusual access patterns
## Summary
The GET keyword provides essential file retrieval capabilities for BASIC tools, enabling access to documents, configuration, and external resources while maintaining security through path validation and sandboxing.