LLM Gateway
The LLM Gateway routes completion requests to configured backends. Agents call a single API - Athyr handles provider selection, failover, and retries.
Why a Gateway?
Without Athyr, each agent needs:
- Provider-specific API code (Ollama vs OpenAI vs Anthropic)
- Retry logic for transient failures
- Fallback handling when a provider is down
- Connection management and health checks
The gateway centralizes this complexity. Agents just call Complete() with a model name.
Providers
Athyr uses Lua scripts to define LLM provider integrations. Each provider is a small Lua file that describes how to talk to an OpenAI-compatible API. Two providers ship built-in:
| Provider | Type | Description |
|---|---|---|
| Ollama | ollama | Local LLM inference |
| OpenRouter | openrouter | Access to 100+ models via unified API |
Both built-in providers support streaming, tool calling, and all standard completion options.
You can add any OpenAI-compatible provider without recompiling Athyr — just drop a Lua script in your data directory. See Custom Providers for the full guide.
Configuration
Configure backends in athyr.yaml:
llm:
backends:
- name: local
type: ollama
url: http://localhost:11434
- name: cloud
type: openrouter
url: https://openrouter.ai/api/v1
api_key: ${OPENROUTER_API_KEY}
retry:
max_attempts: 3
backoff: exponential
Backend Options
| Field | Description |
|---|---|
name | Unique identifier for this backend |
type | Provider type (matches Lua script filename, e.g. ollama, openrouter, or a custom provider) |
url | Base URL for the provider API |
api_key | API key (supports ${ENV_VAR} substitution) |
priority | Routing priority (lower = preferred) |
Retry Configuration
| Field | Description |
|---|---|
max_attempts | Maximum retry attempts per request (default: 3) |
backoff | Backoff strategy: exponential, linear, or fixed |
Usage
Basic Completion
resp, err := agent.Complete(ctx, athyr.CompletionRequest{
Model: "llama3",
Messages: []athyr.Message{
{Role: "user", Content: "Explain quantum computing"},
},
})
fmt.Println(resp.Content)
Streaming
err := agent.CompleteStream(ctx, athyr.CompletionRequest{
Model: "llama3",
Messages: messages,
}, func (chunk athyr.StreamChunk) error {
fmt.Print(chunk.Content)
if chunk.Done {
fmt.Printf("\nTokens: %d\n", chunk.Usage.TotalTokens)
}
return nil
})
Completion Options
resp, _ := agent.Complete(ctx, athyr.CompletionRequest{
Model: "llama3",
Messages: messages,
Config: athyr.CompletionConfig{
Temperature: 0.7,
MaxTokens: 1000,
TopP: 0.9,
Stop: []string{"\n\n"},
},
})
| Option | Description |
|---|---|
Temperature | Randomness (0.0-1.0, higher = more creative) |
MaxTokens | Maximum tokens to generate |
TopP | Nucleus sampling threshold |
Stop | Sequences that stop generation |
Tool Calling
Define tools the LLM can invoke:
resp, _ := agent.Complete(ctx, athyr.CompletionRequest{
Model: "llama3",
Messages: messages,
Tools: []athyr.Tool{
{
Name: "get_weather",
Description: "Get current weather for a location",
Parameters: `{"type":"object","properties":{"location":{"type":"string"}}}`,
},
},
})
// Check if LLM wants to call a tool
if len(resp.ToolCalls) > 0 {
for _, call := range resp.ToolCalls {
fmt.Printf("Tool: %s, Args: %s\n", call.Name, call.Arguments)
}
}
List Available Models
models, _ := agent.Models(ctx)
for _, m := range models {
fmt.Printf("%s (%s)\n", m.Name, m.Backend)
}
Fault Tolerance
Circuit Breaker
Each backend has a circuit breaker that trips after repeated failures:
- Closed - Normal operation, requests pass through
- Open - Backend marked unhealthy, requests fail fast
- Half-Open - Testing if backend recovered
When a circuit opens, requests automatically route to healthy backends.
Automatic Failover
If a backend fails:
- Request retries with exponential backoff
- After max attempts, circuit breaker records failure
- Request routes to alternate backend
- Original backend recovers when circuit closes
Response Metadata
Completion responses include routing information:
resp, _ := agent.Complete(ctx, req)
fmt.Println(resp.Content) // Generated text
fmt.Println(resp.Model) // Model that served the request
fmt.Println(resp.Backend) // Backend name
fmt.Println(resp.Latency) // Request duration
fmt.Println(resp.FinishReason) // "stop", "length", or "tool_calls"
fmt.Println(resp.Usage.TotalTokens)
Next Steps
- Custom Providers - Add your own LLM providers via Lua scripts
- Agents - Using LLM completions in agents
- State Management - Memory sessions for conversations
- Configuration - Full configuration reference