All FastAPI templates

FastAPI Production Deploy

Docker images, uvicorn workers, Kubernetes manifests, observability.

DevZone Tools870 copiesUpdated Apr 12, 2026FastAPIPython
# CLAUDE.md — FastAPI Production Deploy

## Image

- Two-stage Dockerfile. Builder installs deps, runtime copies the `/app` and the venv.
- Base: `python:3.12-slim-bookworm`. Pin minor version.
- Non-root user: `RUN adduser --disabled-password app && USER app`.
- One process per container. The image runs `uvicorn`; orchestration handles replicas.

## ASGI server

- **uvicorn** behind a real ASGI gateway (`uvicorn[standard]` for HTTP/2 and websockets):
  ```sh
  uvicorn myapp.main:app --host 0.0.0.0 --port 8000 --workers 1
  ```
- Use **gunicorn with uvicorn workers** when you want process-level concurrency: `gunicorn -k uvicorn.workers.UvicornWorker -w 4`.
- For pure CPU-bound work, you don't want async — split into a worker queue.
- `--workers` count: `2 × CPU` is a starting point. Profile.

## Lifespan & startup

- Use the modern `lifespan` context manager, not `@app.on_event("startup")`:
  ```python
  @asynccontextmanager
  async def lifespan(app: FastAPI):
      await connect_db()
      yield
      await disconnect_db()
  ```
- Open connections in `lifespan`, not at import time. Otherwise multi-worker setups create N pools.

## Health checks

- `/health/live` — process is alive. Returns 200 unconditionally.
- `/health/ready` — process can serve traffic (DB reachable, cache reachable, migrations applied). Returns 503 on failure.
- Don't gate liveness on the DB. A dead DB shouldn't kill the pod that's mid-graceful-shutdown.

## Kubernetes

- Resource requests and limits set on every deployment. CPU `request` ≈ steady-state; `limit` 2× request or unset.
- HPA on CPU + custom metrics (request rate). Pure CPU isn't enough for async workloads.
- Probes:
  - `livenessProbe`: `/health/live`, period 10s
  - `readinessProbe`: `/health/ready`, period 5s
  - `startupProbe` for slow boots, period 5s, `failureThreshold` 30
- `terminationGracePeriodSeconds` ≥ uvicorn's graceful shutdown timeout.

## Observability

- **OpenTelemetry** instrumentation: `opentelemetry-instrumentation-fastapi`. Auto-traces every request.
- Logs: JSON to stdout, fields for `trace_id`, `span_id`, `user_id` where available.
- Metrics: Prometheus exposed at `/metrics` (use `prometheus-fastapi-instrumentator`).
- Sentry for exceptions, gated on `SENTRY_DSN` env var.

## Configuration

- `pydantic-settings` reads env into a typed `Settings` object. Single instance, imported once.
- 12-factor: every config from env, no config files in the image.
- Secrets injected by the orchestrator (Kubernetes Secrets, AWS Secrets Manager, etc.). Never baked.

## Don't

- Don't run uvicorn in `--reload` mode in production.
- Don't put migrations in `lifespan`. Run them as a one-shot Job before rolling out the new ReplicaSet.
- Don't ship without limits — a runaway worker can exhaust the node.
- Don't rely on `sys.exit` for shutdown. Cleanly close connections and let uvicorn drain.

Other FastAPI templates