API Configuration Guide
The LLM Proxy uses a configuration-driven approach to define which API providers and endpoints are allowed. This document explains how to configure the API providers and customize the proxy for different AI providers.
Configuration File
The API configuration is defined in a YAML file, typically located at ./config/api_providers.yaml. You can specify a different location using the API_CONFIG_PATH environment variable.
Basic Structure
The configuration file has the following structure:
# Default API provider to use if not specified
default_api: openai
# Configuration for each API provider
apis:
provider1:
base_url: https://api.provider1.com
allowed_endpoints:
- /v1/endpoint1
- /v1/endpoint2
allowed_methods:
- GET
- POST
timeouts:
request: 60s
response_header: 30s
idle_connection: 90s
flush_interval: 100ms
connection:
max_idle_conns: 100
max_idle_conns_per_host: 20
provider2:
# ... similar configuration
Configuration Fields
Top-Level Fields
default_api: The default API provider to use if not specified in requestsapis: A map of API provider configurations
API Provider Configuration
Each API provider has the following configuration options:
base_url: The base URL of the API provider (required)allowed_endpoints: A list of endpoint paths that are allowed to be accessed (required)allowed_methods: A list of HTTP methods that are allowed (required)timeouts: Timeout settings for various operationsrequest: Overall request timeoutresponse_header: Timeout for receiving response headersidle_connection: How long to keep idle connections aliveflush_interval: How often to flush streaming responses
connection: Connection pool settingsmax_idle_conns: Maximum number of idle connectionsmax_idle_conns_per_host: Maximum number of idle connections per host
param_whitelist: (optional) Restrict allowed values for specific request parameters (e.g., model). Supports glob patterns (e.g.,gpt-4.1-*).allowed_origins: (optional) Restrict allowed CORS origins for API requests. Only requests from these origins will be accepted.required_headers: (optional) Require specific headers (e.g.,Origin) for requests to be accepted.
Example with Advanced Options
apis:
openai:
base_url: https://api.openai.com
allowed_endpoints:
- /v1/chat/completions
- /v1/completions
allowed_methods:
- POST
param_whitelist:
model:
- gpt-4o
- gpt-4.1-*
allowed_origins:
- https://www.sofatutor.com
- http://localhost:4000
required_headers:
- origin
timeouts:
request: 60s
response_header: 30s
idle_connection: 90s
flush_interval: 100ms
connection:
max_idle_conns: 100
max_idle_conns_per_host: 20
param_whitelist: Use this to restrict which models or other parameters can be used in requests. If a request specifies a value not in the whitelist, it will be rejected with a 400 error.
allowed_origins: Use this to enforce CORS policies. Only requests from these origins will be accepted. If not set, all origins are allowed by default.
required_headers: Use this to require headers like Origin for all requests. If a required header is missing, the request will be rejected with a 400 error.
- If
originis listed inrequired_headers, the proxy will also checkallowed_originsand block requests with an Origin header not in the allowed list.
Security Considerations
The allowlist-based configuration provides several security benefits:
- Restricted Access: Only explicitly allowed endpoints and methods can be accessed, reducing the attack surface.
- API Isolation: Each API provider has its own separate configuration.
- Transparent Validation: All requests are validated against the allowlist before being proxied.
Adding a New API Provider
To add a new API provider, follow these steps:
- Add a new entry to the
apismap in the configuration file - Define the
base_urlfor the provider - List all
allowed_endpointsthat should be accessible - Define the
allowed_methodsfor those endpoints - Configure appropriate timeouts and connection settings
Environment Variables
The proxy uses the following environment variables related to API configuration:
API_CONFIG_PATH: Path to the API providers configuration file (default:./config/api_providers.yaml)DEFAULT_API_PROVIDER: Default API provider to use (overrides thedefault_apiin the config file)OPENAI_API_URL: Base URL for OpenAI API (legacy support, default:https://api.openai.com)
HTTP Caching Configuration
The proxy supports HTTP response caching with the following environment variables:
HTTP_CACHE_ENABLED: Enable or disable HTTP response caching (default:true)HTTP_CACHE_BACKEND: Cache backend to use, eitherredisorin-memory(default:in-memory)REDIS_ADDR: Redis server address, shared with event bus (default:localhost:6379)REDIS_DB: Redis database number (default:0)REDIS_CACHE_URL: Optional override for Redis cache URL (constructed fromREDIS_ADDR+REDIS_DBif not set)REDIS_CACHE_KEY_PREFIX: Prefix for Redis cache keys (default:llmproxy:cache:)HTTP_CACHE_MAX_OBJECT_BYTES: Maximum size in bytes for cached objects (default:1048576- 1MB)HTTP_CACHE_DEFAULT_TTL: Default TTL in seconds when upstream response doesn’t specify caching directives (default:300- 5 minutes)
Cache Behavior
The caching system follows HTTP standards:
- GET/HEAD requests: Cached by default when upstream permits
- POST requests: Only cached when client explicitly opts in via request
Cache-Controlheader - Authentication: Cached responses for authenticated requests are only served if marked as publicly cacheable (
Cache-Control: publicors-maxagepresent) - Streaming responses: Captured during streaming and stored after completion
- TTL precedence:
s-maxage(shared cache) takes precedence overmax-age - Headers: Responses include
X-PROXY-CACHE,X-PROXY-CACHE-KEY, andCache-Statusfor observability
Cache Stats Aggregation
The proxy supports per-token cache hit tracking for visibility in the Admin UI. This is implemented as an asynchronous, lossy-tolerant aggregation system that doesn’t impact request latency.
CACHE_STATS_BUFFER_SIZE: Size of the buffered channel for cache hit events (default:1000). If the buffer is full, events are dropped without blocking the request.
The aggregator:
- Flushes stats to the database every 5 seconds or when 100 events accumulate (whichever first)
- Uses non-blocking enqueue to avoid impacting hot path latency
- Gracefully flushes pending stats on server shutdown
In the Admin UI tokens list, you’ll see:
- CACHED: Number of cache-served responses for the token
- UPSTREAM: Number of upstream-served responses (
request_count - cache_hit_count) - LIMIT: Remaining requests until rate limit (or ∞ for unlimited tokens)
Example Configuration
See api_providers_example.yaml for a comprehensive example configuration with multiple API providers.
Fallback Behavior
If the configuration file cannot be loaded or contains errors, the proxy will fall back to a default OpenAI configuration with common endpoints. This ensures that the proxy can still function even without a valid configuration file.
Endpoint Matching
Endpoints are matched by prefix. For example, if /v1/chat/completions is in the allowed endpoints, then both /v1/chat/completions and /v1/chat/completions?temperature=0.7 will match.
Method Validation
HTTP methods (GET, POST, DELETE, etc.) are validated against the allowed_methods list. If a request uses a method that is not in the list, it will be rejected with a 405 Method Not Allowed error.
Testing Your Configuration
You can test your configuration by sending requests to the proxy endpoints and verifying that only allowed endpoints and methods are accepted. Disallowed endpoints will return a 404 Not Found error, and disallowed methods will return a 405 Method Not Allowed error.