LLM Gateway

The LLM Gateway routes completion requests to configured backends. Agents call a single API - Athyr handles provider selection, failover, and retries.

Why a Gateway?

Without Athyr, each agent needs:

The gateway centralizes this complexity. Agents just call Complete() with a model name.

Providers

Athyr uses Lua scripts to define LLM provider integrations. Each provider is a small Lua file that describes how to talk to an OpenAI-compatible API. Two providers ship built-in:

ProviderTypeDescription
OllamaollamaLocal LLM inference
OpenRouteropenrouterAccess to 100+ models via unified API

Both built-in providers support streaming, tool calling, and all standard completion options.

You can add any OpenAI-compatible provider without recompiling Athyr — just drop a Lua script in your data directory. See Custom Providers for the full guide.

Configuration

Configure backends in athyr.yaml:

llm:
  backends:
    - name: local
      type: ollama
      url: http://localhost:11434

    - name: cloud
      type: openrouter
      url: https://openrouter.ai/api/v1
      api_key: ${OPENROUTER_API_KEY}

  retry:
    max_attempts: 3
    backoff: exponential

Backend Options

FieldDescription
nameUnique identifier for this backend
typeProvider type (matches Lua script filename, e.g. ollama, openrouter, or a custom provider)
urlBase URL for the provider API
api_keyAPI key (supports ${ENV_VAR} substitution)
priorityRouting priority (lower = preferred)

Retry Configuration

FieldDescription
max_attemptsMaximum retry attempts per request (default: 3)
backoffBackoff strategy: exponential, linear, or fixed

Usage

Basic Completion

resp, err := agent.Complete(ctx, athyr.CompletionRequest{
Model: "llama3",
Messages: []athyr.Message{
{Role: "user", Content: "Explain quantum computing"},
},
})
fmt.Println(resp.Content)

Streaming

err := agent.CompleteStream(ctx, athyr.CompletionRequest{
Model: "llama3",
Messages: messages,
}, func (chunk athyr.StreamChunk) error {
fmt.Print(chunk.Content)
if chunk.Done {
fmt.Printf("\nTokens: %d\n", chunk.Usage.TotalTokens)
}
return nil
})

Completion Options

resp, _ := agent.Complete(ctx, athyr.CompletionRequest{
Model:    "llama3",
Messages: messages,
Config: athyr.CompletionConfig{
Temperature: 0.7,
MaxTokens:   1000,
TopP:        0.9,
Stop:        []string{"\n\n"},
},
})
OptionDescription
TemperatureRandomness (0.0-1.0, higher = more creative)
MaxTokensMaximum tokens to generate
TopPNucleus sampling threshold
StopSequences that stop generation

Tool Calling

Define tools the LLM can invoke:

resp, _ := agent.Complete(ctx, athyr.CompletionRequest{
Model: "llama3",
Messages: messages,
Tools: []athyr.Tool{
{
Name:        "get_weather",
Description: "Get current weather for a location",
Parameters:  `{"type":"object","properties":{"location":{"type":"string"}}}`,
},
},
})

// Check if LLM wants to call a tool
if len(resp.ToolCalls) > 0 {
for _, call := range resp.ToolCalls {
fmt.Printf("Tool: %s, Args: %s\n", call.Name, call.Arguments)
}
}

List Available Models

models, _ := agent.Models(ctx)
for _, m := range models {
fmt.Printf("%s (%s)\n", m.Name, m.Backend)
}

Fault Tolerance

Circuit Breaker

Each backend has a circuit breaker that trips after repeated failures:

When a circuit opens, requests automatically route to healthy backends.

Automatic Failover

If a backend fails:

  1. Request retries with exponential backoff
  2. After max attempts, circuit breaker records failure
  3. Request routes to alternate backend
  4. Original backend recovers when circuit closes

Response Metadata

Completion responses include routing information:

resp, _ := agent.Complete(ctx, req)

fmt.Println(resp.Content)       // Generated text
fmt.Println(resp.Model)         // Model that served the request
fmt.Println(resp.Backend) // Backend name
fmt.Println(resp.Latency) // Request duration
fmt.Println(resp.FinishReason) // "stop", "length", or "tool_calls"
fmt.Println(resp.Usage.TotalTokens)

Next Steps