Troubleshooting & FAQ

This guide covers common issues and their solutions when using LLM Proxy.

Quick Diagnostics

Before diving into specific issues, run these quick checks:

# Check if proxy is running
curl http://localhost:8080/health

# Check logs for errors
docker logs llm-proxy | tail -50

# Verify configuration
docker inspect llm-proxy | grep -A 20 "Env"

Installation Issues

Docker Container Won’t Start

Symptom: Container exits immediately after starting.

Check logs:

docker logs llm-proxy

Common causes:

Missing MANAGEMENT_TOKEN

# Error: MANAGEMENT_TOKEN environment variable is required
   
# Solution: Set the environment variable
docker run -e MANAGEMENT_TOKEN=your-token ...

Port already in use

# Error: bind: address already in use
   
# Check what's using port 8080
lsof -i :8080  # macOS/Linux
netstat -an | findstr :8080  # Windows
   
# Solution: Use a different port
docker run -p 9000:8080 ...

Volume permission issues

# Error: permission denied
   
# Solution: Fix permissions
sudo chown -R $(id -u):$(id -g) ./data

Build from Source Fails

Go version mismatch:

# Error: requires go >= 1.23

# Check version
go version

# Solution: Update Go from https://go.dev/dl/

Missing dependencies:

# Solution: Download all dependencies
go mod download
go mod tidy

Lint failures:

# Run linter to see issues
make lint

# Fix formatting
make fmt

Authentication Errors

401 Unauthorized - Invalid Token

Symptom: {"error": "unauthorized"}

Causes & Solutions:

Token is expired

# Check token details
curl -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
  "http://localhost:8080/manage/tokens/<token-id>"
   
# Look for expires_at - if in the past, generate a new token

Token value is incorrect
- Verify you’re using the full token value
- Check for extra whitespace or newlines
- Ensure correct Authorization header format: Bearer <token>

Token is revoked

# Check is_active field
curl -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
  "http://localhost:8080/manage/tokens/<token-id>"
   
# If is_active: false, generate a new token or reactivate

401 Unauthorized - Invalid Management Token

Symptom: Management API returns 401.

Solutions:

Verify the management token is set

# Check environment
echo $MANAGEMENT_TOKEN
   
# In Docker
docker inspect llm-proxy | grep MANAGEMENT_TOKEN

Ensure correct header format

curl -H "Authorization: Bearer $MANAGEMENT_TOKEN" ...

Token value mismatch - The token in your request must match exactly what the server was started with.

403 Forbidden - Project Inactive

Symptom: {"error": "project is inactive"}

Solution: Activate the project:

curl -X PATCH http://localhost:8080/manage/projects/<project-id> \
  -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"is_active": true}'

429 Too Many Requests

Symptom: Rate limit exceeded.

Causes:

Token request limit reached

# Check token's request_count vs max_requests
curl -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
  "http://localhost:8080/manage/tokens/<token-id>"

Global or IP rate limiting
- Wait for the rate limit window to reset
- Reduce request frequency
- Consider increasing rate limits if appropriate

Solutions:

Generate a new token with higher limits
Implement request batching
Use caching to reduce upstream requests

Database Issues

SQLite Permission Denied

Symptom: unable to open database file

Solutions:

Check directory permissions

# Ensure data directory exists and is writable
mkdir -p ./data
chmod 755 ./data

Check file permissions
```
chmod 644 ./data/llm-proxy.db
```

Docker volume issues

# Use named volume or fix host path permissions
docker run -v llm-proxy-data:/app/data ...

PostgreSQL Connection Issues

See PostgreSQL Troubleshooting Guide for detailed PostgreSQL issues.

Quick checks:

# Test connection
psql "$DATABASE_URL" -c "SELECT 1"

# Common issues:
# - Database not running
# - Wrong host/port
# - Password authentication failed
# - Database doesn't exist
# - SSL configuration mismatch

Migration Errors

Symptom: Migrations fail to run.

# Check migration status
llm-proxy migrate status

# Run pending migrations
llm-proxy migrate up

# If stuck, check for lock
# PostgreSQL:
psql "$DATABASE_URL" -c "SELECT * FROM pg_locks WHERE locktype = 'advisory';"

Cache Issues

Redis Connection Failed

Symptom: connection refused to Redis.

Solutions:

Verify Redis is running

docker ps | grep redis
redis-cli ping  # Should return PONG

Check connection settings

# Verify REDIS_ADDR format
# Correct: hostname:6379 or localhost:6379
# Optional: Set REDIS_DB for database selection (default: 0)

Network issues in Docker

# Containers must be on same network
docker network inspect bridge
   
# Use container name as hostname
REDIS_ADDR=redis:6379

High Cache Miss Rate

Symptom: Low cache hit ratio in metrics.

Causes & Solutions:

TTL too short

# Increase default TTL
HTTP_CACHE_DEFAULT_TTL=600  # 10 minutes

Cache disabled

# Verify caching is enabled
HTTP_CACHE_ENABLED=true

Unique requests - Each unique prompt creates a separate cache entry. This is expected behavior.
POST requests not opted-in - POST requests require explicit Cache-Control header to be cached.

Cache Not Clearing

Symptom: Old responses being served.

Solution: Purge cache manually:

llm-proxy manage cache purge \
  --method GET \
  --url "/v1/models" \
  --management-token $MANAGEMENT_TOKEN

Proxy Errors

502 Bad Gateway

Symptom: Upstream API unreachable.

Causes & Solutions:

OpenAI API down - Check OpenAI Status

Network connectivity

# Test from container
docker exec llm-proxy wget -q -O- https://api.openai.com/v1/models

Invalid API key - Verify the project’s OpenAI API key is valid

504 Gateway Timeout

Symptom: Upstream request timed out.

Solutions:

Increase timeout
```
REQUEST_TIMEOUT=120s
```
Reduce request complexity - Use simpler prompts or smaller models
Check upstream latency - OpenAI may be experiencing high load

413 Request Too Large

Symptom: Request body exceeds limit.

Solution: Increase max request size:

MAX_REQUEST_SIZE=50MB

Admin UI Issues

Cannot Access Admin UI

Symptom: 404 or connection refused at /admin/.

Solutions:

Verify UI is enabled
```
ADMIN_UI_ENABLED=true
```
Check the path
- Default: http://localhost:8080/admin/
- Custom: Check ADMIN_UI_PATH setting
Separate Admin service - If running admin as separate container, check the admin container logs

Symptom: Cannot log into Admin UI.

Solutions:

Use management token - The Admin UI uses the same MANAGEMENT_TOKEN
Clear browser cache - Try incognito/private mode
Check CORS - If accessing from different domain, configure CORS_ALLOWED_ORIGINS

Stale Data in Admin UI

Symptom: Changes not reflected immediately.

Solutions:

Refresh the page - Data is fetched on page load

Clear browser cache

Ctrl+Shift+R (Windows/Linux)
Cmd+Shift+R (macOS)

Event Bus Issues

Events Not Being Delivered

Symptom: Dispatcher not receiving events.

Solutions:

In-memory vs Redis Streams - For multi-process, use Redis Streams (default):
```
LLM_PROXY_EVENT_BUS=redis-streams
REDIS_ADDR=redis:6379
```

Check dispatcher is running

docker ps | grep dispatcher
docker logs llm-proxy-logger

Buffer overflow - Increase buffer size:
```
OBSERVABILITY_BUFFER_SIZE=2000
```

Event Loss

Symptom: Gaps in event log.

Causes:

Redis TTL/trimming before dispatcher reads
Dispatcher falling behind

Solutions:

Increase Redis retention settings
Scale dispatcher batch size
Monitor dispatcher lag

See Instrumentation Guide - Production Reliability.

Performance Issues

See Performance Tuning Guide for detailed optimization.

High Latency

Quick checks:

# Enable debug logging
LOG_LEVEL=debug

# Check cache headers
curl -v -H "Authorization: Bearer $TOKEN" \
  http://localhost:8080/v1/models
# Look for X-PROXY-CACHE: hit vs miss

Memory Usage High

Solutions:

Reduce connection pool size
Reduce event buffer size
Enable caching to reduce upstream requests

FAQ

How do I rotate the management token?

Update the MANAGEMENT_TOKEN environment variable
Restart the proxy
Update any scripts/automation using the old token

Can I use multiple API keys?

Yes, create multiple projects, each with its own API key. Tokens are project-specific.

How do I backup the database?

SQLite:

cp ./data/llm-proxy.db ./backups/llm-proxy-$(date +%Y%m%d).db

PostgreSQL:

pg_dump -U llmproxy llmproxy > backup.sql

How do I check token usage?

# Via API
curl -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
  "http://localhost:8080/manage/tokens/<token-id>"

# Check request_count and cache_hit_count fields

Can I extend a token’s expiration?

No, tokens cannot be extended. Generate a new token before the old one expires.

How do I completely remove a project?

Projects cannot be deleted (405 Method Not Allowed) for data safety. Instead:

Deactivate the project
Revoke all its tokens
The project remains in the database for audit purposes

Why are my requests not being cached?

POST requests require explicit opt-in via Cache-Control: public header
Responses with Cache-Control: no-store are not cached
Responses larger than HTTP_CACHE_MAX_OBJECT_BYTES are not cached
Each unique request creates a separate cache entry

How do I enable Prometheus metrics?

ENABLE_METRICS=true
METRICS_PATH=/metrics

Access at http://localhost:8080/metrics

Can I run multiple proxy instances?

Yes, for horizontal scaling:

Use PostgreSQL (not SQLite) for shared database
Use Redis for distributed caching and rate limiting
Use Redis for distributed event bus
Use a load balancer in front

See Performance Tuning - Horizontal Scaling.

Getting Help

If you’re still experiencing issues:

Check logs with LOG_LEVEL=debug
Search GitHub Issues
Review documentation for your specific use case
Open a new issue with:
- Error message and stack trace
- LLM Proxy version
- Configuration (redact sensitive values)
- Steps to reproduce

Troubleshooting & FAQ

Quick Diagnostics

Installation Issues

Docker Container Won’t Start

Build from Source Fails

Authentication Errors

401 Unauthorized - Invalid Token

401 Unauthorized - Invalid Management Token

403 Forbidden - Project Inactive

429 Too Many Requests

Database Issues

SQLite Permission Denied

PostgreSQL Connection Issues

Migration Errors

Cache Issues

Redis Connection Failed

High Cache Miss Rate

Cache Not Clearing

Proxy Errors

502 Bad Gateway

504 Gateway Timeout

413 Request Too Large

Admin UI Issues

Cannot Access Admin UI

Login Fails

Stale Data in Admin UI

Event Bus Issues

Events Not Being Delivered

Event Loss

Performance Issues

High Latency

Memory Usage High

FAQ

How do I rotate the management token?

Can I use multiple API keys?

How do I backup the database?

How do I check token usage?

Can I extend a token’s expiration?

How do I completely remove a project?

Why are my requests not being cached?

How do I enable Prometheus metrics?

Can I run multiple proxy instances?

Getting Help

Related Documentation