FastAPI Production Deploy
Docker images, uvicorn workers, Kubernetes manifests, observability.
# CLAUDE.md — FastAPI Production Deploy
## Image
- Two-stage Dockerfile. Builder installs deps, runtime copies the `/app` and the venv.
- Base: `python:3.12-slim-bookworm`. Pin minor version.
- Non-root user: `RUN adduser --disabled-password app && USER app`.
- One process per container. The image runs `uvicorn`; orchestration handles replicas.
## ASGI server
- **uvicorn** behind a real ASGI gateway (`uvicorn[standard]` for HTTP/2 and websockets):
```sh
uvicorn myapp.main:app --host 0.0.0.0 --port 8000 --workers 1
```
- Use **gunicorn with uvicorn workers** when you want process-level concurrency: `gunicorn -k uvicorn.workers.UvicornWorker -w 4`.
- For pure CPU-bound work, you don't want async — split into a worker queue.
- `--workers` count: `2 × CPU` is a starting point. Profile.
## Lifespan & startup
- Use the modern `lifespan` context manager, not `@app.on_event("startup")`:
```python
@asynccontextmanager
async def lifespan(app: FastAPI):
await connect_db()
yield
await disconnect_db()
```
- Open connections in `lifespan`, not at import time. Otherwise multi-worker setups create N pools.
## Health checks
- `/health/live` — process is alive. Returns 200 unconditionally.
- `/health/ready` — process can serve traffic (DB reachable, cache reachable, migrations applied). Returns 503 on failure.
- Don't gate liveness on the DB. A dead DB shouldn't kill the pod that's mid-graceful-shutdown.
## Kubernetes
- Resource requests and limits set on every deployment. CPU `request` ≈ steady-state; `limit` 2× request or unset.
- HPA on CPU + custom metrics (request rate). Pure CPU isn't enough for async workloads.
- Probes:
- `livenessProbe`: `/health/live`, period 10s
- `readinessProbe`: `/health/ready`, period 5s
- `startupProbe` for slow boots, period 5s, `failureThreshold` 30
- `terminationGracePeriodSeconds` ≥ uvicorn's graceful shutdown timeout.
## Observability
- **OpenTelemetry** instrumentation: `opentelemetry-instrumentation-fastapi`. Auto-traces every request.
- Logs: JSON to stdout, fields for `trace_id`, `span_id`, `user_id` where available.
- Metrics: Prometheus exposed at `/metrics` (use `prometheus-fastapi-instrumentator`).
- Sentry for exceptions, gated on `SENTRY_DSN` env var.
## Configuration
- `pydantic-settings` reads env into a typed `Settings` object. Single instance, imported once.
- 12-factor: every config from env, no config files in the image.
- Secrets injected by the orchestrator (Kubernetes Secrets, AWS Secrets Manager, etc.). Never baked.
## Don't
- Don't run uvicorn in `--reload` mode in production.
- Don't put migrations in `lifespan`. Run them as a one-shot Job before rolling out the new ReplicaSet.
- Don't ship without limits — a runaway worker can exhaust the node.
- Don't rely on `sys.exit` for shutdown. Cleanly close connections and let uvicorn drain.
Other FastAPI templates
FastAPI + Pydantic + SQLAlchemy
Standard FastAPI stack: Pydantic v2, SQLAlchemy 2.0, dependency injection.
FastAPI Async + Postgres
Async-first FastAPI with asyncpg, connection pooling, and migrations.
FastAPI JWT Authentication
JWT auth with refresh tokens, password hashing, and role-based access.
FastAPI Testing (pytest + httpx)
Test FastAPI apps with pytest, httpx AsyncClient, and isolated DB fixtures.