Kubernetes Deployment with Helm

Status: Available
Chart Location: deploy/helm/llm-proxy


Overview

Deploy LLM Proxy to Kubernetes using the official Helm chart. The chart supports:

  • SQLite for single-instance deployments (development/testing)
  • PostgreSQL for production (external or in-cluster)
  • Redis for event bus and caching (external)
  • Ingress for external access with TLS
  • Horizontal Pod Autoscaler (HPA) for automatic scaling
  • Dispatcher for async event forwarding to observability platforms

When to Use Helm

Choose Helm deployment when:

  • You already have Kubernetes infrastructure
  • You need fine-grained control over deployment configuration
  • You want to integrate with existing K8s tooling (Ingress, HPA, service mesh)
  • You need multi-region or multi-cluster deployments

For AWS-native deployments without existing K8s infrastructure, consider AWS ECS instead.


Prerequisites

  • Kubernetes 1.19+ cluster
  • Helm 3.0+ installed
  • kubectl configured to access your cluster
  • Container registry with the LLM Proxy image

Quick Start Scenarios

Note on chart path: These examples use the local chart path deploy/helm/llm-proxy, which requires the repository to be checked out. If you prefer to install from the published OCI registry, replace deploy/helm/llm-proxy with oci://ghcr.io/sofatutor/llm-proxy --version <version> in the helm install commands below.

1. SQLite (Single Instance, Development)

Minimal deployment for development or testing:

# Create management token secret
kubectl create secret generic llm-proxy-secrets \
  --from-literal=MANAGEMENT_TOKEN="$(openssl rand -base64 32)"

# Deploy with SQLite
helm install llm-proxy deploy/helm/llm-proxy \
  --set image.repository=your-registry/llm-proxy \
  --set image.tag=v1.0.0 \
  --set secrets.managementToken.existingSecret.name=llm-proxy-secrets

Note: SQLite is the default database. Not suitable for multi-replica deployments.

2. PostgreSQL (External, Production)

Production deployment with external PostgreSQL:

# Create secrets
kubectl create secret generic llm-proxy-secrets \
  --from-literal=MANAGEMENT_TOKEN="$(openssl rand -base64 32)"

# NOTE: Replace USER and PASSWORD with your actual DB credentials; never commit real secrets
kubectl create secret generic llm-proxy-db \
  --from-literal=DATABASE_URL="postgres://USER:PASSWORD@postgres.example.com:5432/llmproxy?sslmode=verify-full"

# Deploy with external PostgreSQL
helm install llm-proxy deploy/helm/llm-proxy \
  --set image.repository=your-registry/llm-proxy \
  --set image.tag=v1.0.0 \
  --set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
  --set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
  --set env.DB_DRIVER=postgres

Important: If building images yourself, ensure PostgreSQL support is enabled:

docker build --build-arg POSTGRES_SUPPORT=true -t your-registry/llm-proxy:v1.0.0 .

Pre-built images from ghcr.io/sofatutor/llm-proxy include PostgreSQL support by default.

3. External Redis (Multi-Instance)

Deploy with Redis for event bus and caching:

# Create secrets
kubectl create secret generic llm-proxy-secrets \
  --from-literal=MANAGEMENT_TOKEN="$(openssl rand -base64 32)"

# NOTE: Replace USER and PASSWORD with your actual DB credentials; never commit real secrets
kubectl create secret generic llm-proxy-db \
  --from-literal=DATABASE_URL="postgres://USER:PASSWORD@postgres.example.com:5432/llmproxy?sslmode=verify-full"

# Create Redis password secret (if authentication is enabled)
openssl rand -base64 32 > /tmp/redis-password.txt
kubectl create secret generic redis-password \
  --from-file=REDIS_PASSWORD=/tmp/redis-password.txt
rm /tmp/redis-password.txt

# Deploy with Redis
helm install llm-proxy deploy/helm/llm-proxy \
  --set image.repository=your-registry/llm-proxy \
  --set image.tag=v1.0.0 \
  --set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
  --set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
  --set env.DB_DRIVER=postgres \
  --set redis.external.addr="redis.example.com:6379" \
  --set redis.external.password.existingSecret.name=redis-password \
  --set env.LLM_PROXY_EVENT_BUS="redis-streams" \
  --set replicaCount=3

Note: Redis is required for multi-instance deployments. The in-memory event bus only works with a single replica.

4. Ingress + TLS (External Access)

Expose the service via Ingress with automatic TLS:

helm install llm-proxy deploy/helm/llm-proxy \
  --set image.repository=your-registry/llm-proxy \
  --set image.tag=v1.0.0 \
  --set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
  --set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
  --set env.DB_DRIVER=postgres \
  --set ingress.enabled=true \
  --set ingress.className=nginx \
  --set 'ingress.annotations.cert-manager\.io/cluster-issuer=letsencrypt-prod' \
  --set ingress.hosts[0].host=api.example.com \
  --set ingress.hosts[0].paths[0].path=/ \
  --set ingress.hosts[0].paths[0].pathType=Prefix \
  --set ingress.tls[0].secretName=llm-proxy-tls \
  --set ingress.tls[0].hosts[0]=api.example.com

Prerequisites:

  • NGINX Ingress Controller (or another Ingress controller) installed
  • cert-manager for automatic TLS certificate management (optional)

5. Autoscaling (HPA)

Enable Horizontal Pod Autoscaler for automatic scaling:

helm install llm-proxy deploy/helm/llm-proxy \
  --set image.repository=your-registry/llm-proxy \
  --set image.tag=v1.0.0 \
  --set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
  --set secrets.databaseUrl.existingSecret.name=llm-proxy-db \
  --set env.DB_DRIVER=postgres \
  --set autoscaling.enabled=true \
  --set autoscaling.minReplicas=2 \
  --set autoscaling.maxReplicas=20 \
  --set autoscaling.targetCPUUtilizationPercentage=75

Prerequisites:

  • metrics-server installed in your cluster
  • Resource requests properly configured (CPU/memory)
  • PostgreSQL database (SQLite does not support multi-replica)

Note: When HPA is enabled, the replicaCount value is ignored.


Production Values File

For production deployments, use a values.yaml file:

# production-values.yaml
image:
  repository: your-registry/llm-proxy
  tag: v1.0.0

secrets:
  managementToken:
    existingSecret:
      name: llm-proxy-secrets
  databaseUrl:
    existingSecret:
      name: llm-proxy-db

env:
  DB_DRIVER: postgres
  LOG_LEVEL: info
  LOG_FORMAT: json
  ENABLE_METRICS: "true"
  LLM_PROXY_EVENT_BUS: redis-streams

redis:
  external:
    addr: "redis.example.com:6379"
    db: 0
    password:
      existingSecret:
        name: redis-password

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: llm-proxy-tls
      hosts:
        - api.example.com

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 20
  targetCPUUtilizationPercentage: 75

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 2000m
    memory: 1Gi

Deploy with the values file:

helm install llm-proxy deploy/helm/llm-proxy -f production-values.yaml

Dispatcher (Event Forwarding)

The optional dispatcher component forwards events to observability platforms:

# Create dispatcher API key secret
kubectl create secret generic dispatcher-secrets \
  --from-literal=DISPATCHER_API_KEY="your-lunary-api-key"

# Deploy with dispatcher for Lunary integration
helm install llm-proxy deploy/helm/llm-proxy \
  --set image.repository=your-registry/llm-proxy \
  --set image.tag=v1.0.0 \
  --set secrets.managementToken.existingSecret.name=llm-proxy-secrets \
  --set redis.external.addr="redis.example.com:6379" \
  --set env.LLM_PROXY_EVENT_BUS="redis-streams" \
  --set dispatcher.enabled=true \
  --set dispatcher.service="lunary" \
  --set dispatcher.apiKey.existingSecret.name="dispatcher-secrets" \
  --set dispatcher.apiKey.existingSecret.key="DISPATCHER_API_KEY"

Supported backends:

  • file - Write events to JSONL file (with PersistentVolumeClaim)
  • lunary - Forward to Lunary.ai for LLM observability
  • helicone - Forward to Helicone for LLM analytics

Important: Dispatcher requires Redis. It cannot be used with the in-memory event bus.


Verification

After deployment, verify the installation:

# Check pod status
kubectl get pods -l app.kubernetes.io/name=llm-proxy

# Check service
kubectl get svc -l app.kubernetes.io/name=llm-proxy

# View logs
kubectl logs -l app.kubernetes.io/name=llm-proxy

# Test health endpoints
kubectl port-forward svc/llm-proxy 8080:8080
curl http://localhost:8080/live
curl http://localhost:8080/ready

For Ingress deployments:

# Check Ingress status
kubectl get ingress

# Test external access (after DNS is configured)
curl https://api.example.com/live

Upgrading

Upgrade an existing deployment:

# Upgrade with new image version
helm upgrade llm-proxy deploy/helm/llm-proxy \
  --reuse-values \
  --set image.tag=v1.1.0

# Upgrade with new values file
helm upgrade llm-proxy deploy/helm/llm-proxy -f production-values.yaml

Uninstalling

# Uninstall the release
helm uninstall llm-proxy

# Optionally, delete secrets
kubectl delete secret llm-proxy-secrets llm-proxy-db redis-password

Complete Documentation

For comprehensive documentation, see:

The chart-local documentation includes:

  • Detailed configuration for all components
  • Security best practices
  • Secret management strategies
  • Health check configuration
  • Resource limits and requests
  • PostgreSQL subchart configuration (in-cluster development)
  • Advanced dispatcher scenarios
  • Troubleshooting guides

Comparison: Helm vs AWS ECS

Factor Helm / Kubernetes AWS ECS
Infrastructure Requires existing K8s cluster AWS-native, no K8s needed
Cost Depends on cluster setup ~$130/mo for low traffic
Complexity Higher (K8s knowledge required) Lower (managed service)
Portability Multi-cloud, on-premise AWS only
Tooling Rich K8s ecosystem AWS-native tools
Scaling HPA, cluster autoscaler ECS auto-scaling
Best For Existing K8s infrastructure AWS-first deployments

Recommendation:

  • Choose Helm if you already have Kubernetes infrastructure or need multi-cloud portability
  • Choose AWS ECS for AWS-native deployments without existing K8s infrastructure

See the AWS ECS Architecture Guide for AWS-specific deployment.


See Also


This site uses Just the Docs, a documentation theme for Jekyll.