Kubernetes Deployment with Helm
Status: Available
Chart Location: deploy/helm/llm-proxy
Overview
Deploy LLM Proxy to Kubernetes using the official Helm chart. The chart supports:
- SQLite for single-instance deployments (development/testing)
- PostgreSQL for production (external or in-cluster)
- Redis for event bus and caching (external)
- Ingress for external access with TLS
- Horizontal Pod Autoscaler (HPA) for automatic scaling
- Dispatcher for async event forwarding to observability platforms
When to Use Helm
Choose Helm deployment when:
- You already have Kubernetes infrastructure
- You need fine-grained control over deployment configuration
- You want to integrate with existing K8s tooling (Ingress, HPA, service mesh)
- You need multi-region or multi-cluster deployments
For AWS-native deployments without existing K8s infrastructure, consider AWS ECS instead.
Prerequisites
- Kubernetes 1.19+ cluster
- Helm 3.0+ installed
- kubectl configured to access your cluster
- Container registry with the LLM Proxy image
Quick Start Scenarios
Note on chart path: These examples use the local chart path
deploy/helm/llm-proxy, which requires the repository to be checked out. If you prefer to install from the published OCI registry, replacedeploy/helm/llm-proxywithoci://ghcr.io/sofatutor/llm-proxy --version <version>in thehelm installcommands below.
1. SQLite (Single Instance, Development)
Minimal deployment for development or testing:
# Create management token secret
kubectl create secret generic llm-proxy-secrets \
--from-literal=MANAGEMENT_TOKEN="$(openssl rand -base64 32)"
# Deploy with SQLite
helm install llm-proxy deploy/helm/llm-proxy \
--set image.repository=your-registry/llm-proxy \
--set image.tag=v1.0.0 \
--set secrets.managementToken.existingSecret.name=llm-proxy-secrets
Note: SQLite is the default database. Not suitable for multi-replica deployments.
2. PostgreSQL (External, Production)
Production deployment with external PostgreSQL:
# Create secrets
kubectl create secret generic llm-proxy-secrets \
--from-literal=MANAGEMENT_TOKEN="$(openssl rand -base64 32)"
# NOTE: Replace USER and PASSWORD with your actual DB credentials; never commit real secrets
kubectl create secret generic llm-proxy-db \
--from-literal=DATABASE_URL="postgres://USER:PASSWORD@postgres.example.com:5432/llmproxy?sslmode=verify-full"
# Deploy with external PostgreSQL
helm install llm-proxy deploy/helm/llm-proxy \
--set image.repository=your-registry/llm-proxy \
--set image.tag=v1.0.0 \
--set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
--set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
--set env.DB_DRIVER=postgres
Important: If building images yourself, ensure PostgreSQL support is enabled:
docker build --build-arg POSTGRES_SUPPORT=true -t your-registry/llm-proxy:v1.0.0 .
Pre-built images from ghcr.io/sofatutor/llm-proxy include PostgreSQL support by default.
3. External Redis (Multi-Instance)
Deploy with Redis for event bus and caching:
# Create secrets
kubectl create secret generic llm-proxy-secrets \
--from-literal=MANAGEMENT_TOKEN="$(openssl rand -base64 32)"
# NOTE: Replace USER and PASSWORD with your actual DB credentials; never commit real secrets
kubectl create secret generic llm-proxy-db \
--from-literal=DATABASE_URL="postgres://USER:PASSWORD@postgres.example.com:5432/llmproxy?sslmode=verify-full"
# Create Redis password secret (if authentication is enabled)
openssl rand -base64 32 > /tmp/redis-password.txt
kubectl create secret generic redis-password \
--from-file=REDIS_PASSWORD=/tmp/redis-password.txt
rm /tmp/redis-password.txt
# Deploy with Redis
helm install llm-proxy deploy/helm/llm-proxy \
--set image.repository=your-registry/llm-proxy \
--set image.tag=v1.0.0 \
--set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
--set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
--set env.DB_DRIVER=postgres \
--set redis.external.addr="redis.example.com:6379" \
--set redis.external.password.existingSecret.name=redis-password \
--set env.LLM_PROXY_EVENT_BUS="redis-streams" \
--set replicaCount=3
Note: Redis is required for multi-instance deployments. The in-memory event bus only works with a single replica.
4. Ingress + TLS (External Access)
Expose the service via Ingress with automatic TLS:
helm install llm-proxy deploy/helm/llm-proxy \
--set image.repository=your-registry/llm-proxy \
--set image.tag=v1.0.0 \
--set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
--set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
--set env.DB_DRIVER=postgres \
--set ingress.enabled=true \
--set ingress.className=nginx \
--set 'ingress.annotations.cert-manager\.io/cluster-issuer=letsencrypt-prod' \
--set ingress.hosts[0].host=api.example.com \
--set ingress.hosts[0].paths[0].path=/ \
--set ingress.hosts[0].paths[0].pathType=Prefix \
--set ingress.tls[0].secretName=llm-proxy-tls \
--set ingress.tls[0].hosts[0]=api.example.com
Prerequisites:
- NGINX Ingress Controller (or another Ingress controller) installed
- cert-manager for automatic TLS certificate management (optional)
5. Autoscaling (HPA)
Enable Horizontal Pod Autoscaler for automatic scaling:
helm install llm-proxy deploy/helm/llm-proxy \
--set image.repository=your-registry/llm-proxy \
--set image.tag=v1.0.0 \
--set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
--set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
--set env.DB_DRIVER=postgres \
--set autoscaling.enabled=true \
--set autoscaling.minReplicas=2 \
--set autoscaling.maxReplicas=20 \
--set autoscaling.targetCPUUtilizationPercentage=75
Prerequisites:
- metrics-server installed in your cluster
- Resource requests properly configured (CPU/memory)
- PostgreSQL database (SQLite does not support multi-replica)
Note: When HPA is enabled, the replicaCount value is ignored.
Production Values File
For production deployments, use a values.yaml file:
# production-values.yaml
image:
repository: your-registry/llm-proxy
tag: v1.0.0
secrets:
managementToken:
existingSecret:
name: llm-proxy-secrets
databaseUrl:
existingSecret:
name: llm-proxy-db
env:
DB_DRIVER: postgres
LOG_LEVEL: info
LOG_FORMAT: json
ENABLE_METRICS: "true"
LLM_PROXY_EVENT_BUS: redis-streams
redis:
external:
addr: "redis.example.com:6379"
db: 0
password:
existingSecret:
name: redis-password
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: api.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: llm-proxy-tls
hosts:
- api.example.com
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
targetCPUUtilizationPercentage: 75
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 2000m
memory: 1Gi
Deploy with the values file:
helm install llm-proxy deploy/helm/llm-proxy -f production-values.yaml
Dispatcher (Event Forwarding)
The optional dispatcher component forwards events to observability platforms:
# Create dispatcher API key secret
kubectl create secret generic dispatcher-secrets \
--from-literal=DISPATCHER_API_KEY="your-lunary-api-key"
# Deploy with dispatcher for Lunary integration
helm install llm-proxy deploy/helm/llm-proxy \
--set image.repository=your-registry/llm-proxy \
--set image.tag=v1.0.0 \
--set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
--set redis.external.addr="redis.example.com:6379" \
--set env.LLM_PROXY_EVENT_BUS="redis-streams" \
--set dispatcher.enabled=true \
--set dispatcher.service="lunary" \
--set dispatcher.apiKey.existingSecret.name="dispatcher-secrets" \
--set dispatcher.apiKey.existingSecret.key="DISPATCHER_API_KEY"
Supported backends:
file- Write events to JSONL file (with PersistentVolumeClaim)lunary- Forward to Lunary.ai for LLM observabilityhelicone- Forward to Helicone for LLM analytics
Important: Dispatcher requires Redis. It cannot be used with the in-memory event bus.
Verification
After deployment, verify the installation:
# Check pod status
kubectl get pods -l app.kubernetes.io/name=llm-proxy
# Check service
kubectl get svc -l app.kubernetes.io/name=llm-proxy
# View logs
kubectl logs -l app.kubernetes.io/name=llm-proxy
# Test health endpoints
kubectl port-forward svc/llm-proxy 8080:8080
curl http://localhost:8080/live
curl http://localhost:8080/ready
For Ingress deployments:
# Check Ingress status
kubectl get ingress
# Test external access (after DNS is configured)
curl https://api.example.com/live
Upgrading
Upgrade an existing deployment:
# Upgrade with new image version
helm upgrade llm-proxy deploy/helm/llm-proxy \
--reuse-values \
--set image.tag=v1.1.0
# Upgrade with new values file
helm upgrade llm-proxy deploy/helm/llm-proxy -f production-values.yaml
Uninstalling
# Uninstall the release
helm uninstall llm-proxy
# Optionally, delete secrets
kubectl delete secret llm-proxy-secrets llm-proxy-db redis-password
Complete Documentation
For comprehensive documentation, see:
- Helm Chart README - Full configuration reference
- Helm Chart Examples - Additional deployment examples
- values.yaml - All configurable values
The chart-local documentation includes:
- Detailed configuration for all components
- Security best practices
- Secret management strategies
- Health check configuration
- Resource limits and requests
- PostgreSQL subchart configuration (in-cluster development)
- Advanced dispatcher scenarios
- Troubleshooting guides
Comparison: Helm vs AWS ECS
| Factor | Helm / Kubernetes | AWS ECS |
|---|---|---|
| Infrastructure | Requires existing K8s cluster | AWS-native, no K8s needed |
| Cost | Depends on cluster setup | ~$130/mo for low traffic |
| Complexity | Higher (K8s knowledge required) | Lower (managed service) |
| Portability | Multi-cloud, on-premise | AWS only |
| Tooling | Rich K8s ecosystem | AWS-native tools |
| Scaling | HPA, cluster autoscaler | ECS auto-scaling |
| Best For | Existing K8s infrastructure | AWS-first deployments |
Recommendation:
- Choose Helm if you already have Kubernetes infrastructure or need multi-cloud portability
- Choose AWS ECS for AWS-native deployments without existing K8s infrastructure
See the AWS ECS Architecture Guide for AWS-specific deployment.
See Also
- Performance Tuning - Optimization and resource planning
- Security Best Practices - Production security guidelines
- Configuration Reference - Environment variables and settings
- Token Management - API token configuration