Vector database pricing is the most confusing line item in any AI infrastructure budget. Pinecone bills per read unit, Weaviate per million dimensions, Qdrant per cluster hour, MongoDB per RAM tier, and pgvector "for free" if you don't count the EC2 bill or the engineer babysitting it. The same 10M-vector RAG workload can cost anywhere from $36 to $1,200 a month depending on which model you map it onto. Here's how the ten serious options actually compare in 2026.
The 10 Contenders
There's no single market — there are three:
Managed serverless / usage-based
- Pinecone Serverless — pay per GB stored, plus read units (RU) and write units (WU)
- Chroma Cloud — pay per GB stored, plus per-million queries and writes
- Weaviate Cloud — pay per million stored dimensions per month
Managed cluster-based (you pick a tier, you pay the hourly rate)
- Pinecone Pod-based — s1 (storage-optimized) and p2 (QPS-optimized) tiers
- Qdrant Cloud — vCPU + RAM tiers
- Zilliz Cloud (Milvus) — Compute Units (CUs); 1 CU ≈ 8 GB RAM ≈ 1.5M 768-dim vectors
- MongoDB Atlas Vector Search — included free in any M10+ Atlas cluster
Self-hosted on AWS (raw infra + your time)
- pgvector on r6g EC2 + gp3 EBS
- Qdrant open-source on r6g EC2
- Milvus open-source on r6g EC2 (distributed, higher ops overhead)
The pricing models don't compare directly. A serverless RU is not a Qdrant vCPU is not a Milvus CU. The only honest way to compare is to fix a workload and run the math for each model — which is what the Vector Database Cost Calculator does.
Side-by-Side at a Representative RAG Workload
Workload: 10M vectors, 1536 dimensions (OpenAI text-embedding-3-small), 5M queries/month, 500K writes/month, float32, replication factor 1, 1 KB metadata per vector. May 2026 pricing.
| Provider | Pricing model | Estimated monthly cost | Notes |
|---|---|---|---|
| pgvector self-hosted (r6g.xlarge) | EC2 + EBS | ~$148 infra only / ~$648 with DevOps | 4 vCPU, 32 GB RAM; you operate it |
| Qdrant self-hosted (r6g.large) | EC2 + EBS | ~$74 infra only / ~$574 with DevOps | Best raw price-perf |
| Qdrant Cloud (1 vCPU / 4 GB) | Cluster hourly | ~$36 | Cheapest managed at this scale |
| Zilliz Cloud (Milvus) | Compute Units | ~$880 | ~7 CUs × $109/mo + storage |
| Weaviate Cloud Standard | Per-dim | ~$25 (minimum) – ~$1,460 | $0.000095/M dim/mo × 15.36B dims |
| Pinecone Serverless | Storage + RU/WU | ~$120–$180 | $0.33/GB + read/write units |
| Pinecone Pod-based s1.x2 | Pod hourly | ~$140 | 730 hr × $0.192 |
| Chroma Cloud | Storage + queries | ~$72 | $0.25/GB + $5/M queries + $2.50/M writes |
| MongoDB Atlas M30 | Cluster tier | $285 | Vector search included; need M30 for 10M × 1536 dims |
| Milvus self-hosted (r6g.xlarge) | EC2 + EBS | ~$148 infra / ~$898 with DevOps | Higher DevOps because of distributed components |
A few things jump out immediately:
- Qdrant Cloud is brutal on price at small-to-medium scale. It includes both compute and storage in the hourly rate; no separate query fees.
- Weaviate's per-dimension model swings wildly with dimensions. Same vector count at 384 dims (BGE-small) instead of 1536 is 4× cheaper.
- Pinecone Serverless wins or loses on QPS, not storage. The numbers change dramatically with
queries × RU/query. Filtered queries cost 5–10 RU instead of 1–2. - Self-hosted is "cheap" only if you ignore the engineer. $74/month of EC2 is irrelevant against the $500–750/month of ops time it takes to actually run.
The Pricing Models, Decoded
Understanding which knob blows up your bill is half the battle.
Pinecone Serverless: storage + read units + write units
cost = storage_gb × $0.33
+ (queries × ru_per_query / 1M) × $8.25
+ (writes × wu_per_write / 1M) × $2.00
+ max(0, $50 - subtotal) ← $50/mo Standard plan minimum
ru_per_query is 2 for unfiltered and 5–10 for filtered queries. If your app does metadata filtering on every query (e.g., "search docs for tenant_id = X"), assume the high end. Filtered RAG can be 3–5× more expensive than unfiltered on the same vector count.
Pinecone Pod-based: just hourly × 730
cost = pod_hourly_rate × 730
s1.x1 is $0.096/hr (~$70/mo) and holds 5M vectors. Predictable, but capped at p2.x8 (~160M vectors). Beyond that, you migrate to Serverless or contact sales.
Weaviate Cloud: per million dimensions
cost = max($25, vectors × dimensions / 1M × $0.000095 × replicas)
The $0.000095 is tiny, but you multiply by vectors × dimensions. 10M × 1536 = 15.36B dimensions = ~$1,460/month at replication 1. The same workload at 768 dims is $730. Weaviate punishes high-dimension embeddings. Picking a 384-dim model (BGE-small) over 1536 (text-embedding-3-small) is a 4× cost cut on Weaviate alone.
Qdrant Cloud: hourly cluster, all-in
cost = tier_hourly × 730
Storage and queries are included; no separate fees. Capacity per tier is RAM-bound — the HNSW index needs to fit in memory. The 4 vCPU / 16 GB tier ($143/mo) handles ~25M vectors at 768 dims. Quantization extends capacity dramatically — int8 cuts RAM ~4×, binary ~32×.
Zilliz Cloud (Milvus): Compute Units
cost = ceil(vectors / 1.5M) × $0.15/hr × 730 + storage_gb × $0.04
1 CU ≈ 8 GB RAM ≈ 1.5M 768-dim float32 vectors. At 1536 dims you double the CU count. Storage is S3-cheap ($0.04/GB), so Zilliz's bill is dominated by compute. Best for large workloads where you want managed Milvus without operating it yourself.
Chroma Cloud: storage + queries + writes
cost = storage_gb × $0.25 + (queries / 1M) × $5 + (writes / 1M) × $2.50
No monthly minimum, pure usage. Cleanest model for prototypes and small deployments. Pricing is still in flux as of May 2026; the calculator pulls from last_verified so check the source link before committing.
MongoDB Atlas: cluster tier ladder
Vector search is free if you're already on M10 or above. The cost is the cluster, not the search. Tiers double in price (M10 $60 → M20 $130 → M30 $285 → M40 $605...). Pick by RAM, since the HNSW index lives in memory. Best when you already use Atlas for your application data.
Self-hosted pgvector / Qdrant / Milvus
infra = ec2_hourly × 730 + ebs_gb × $0.10
total = infra + (devops_overhead ?? 0)
The infrastructure number is small. The number that matters is the second one. The calculator defaults to $500/month for pgvector and Qdrant, $750 for Milvus (distributed). Toggle DevOps overhead off to see raw infra; on for true total cost of ownership.
How Cost Scales
Same Workload, different vector counts. 1536 dims, 1M queries/month, replication 1:
| Vectors | Pinecone Serverless | Qdrant Cloud | Weaviate Cloud | pgvector self-hosted (with ops) |
|---|---|---|---|---|
| 100K | $50 (minimum) | $36 | $25 (minimum) | ~$575 |
| 1M | $50–60 | $36 | $146 | ~$575 |
| 10M | $120–180 | $143 (4 vCPU tier) | $1,460 | ~$648 |
| 100M | $400–800 | $571 (16 vCPU) | $14,600 | ~$1,150 + sharding |
| 1B | $3K–6K | $2,300+ | $146,000 (impractical) | needs sharding |
Three patterns:
- Below ~1M vectors, the minimums dominate. Don't agonize over pricing — pick the one with the easiest DX (Chroma, Qdrant Cloud, or Pinecone Serverless free tier).
- From 1M to 100M, Qdrant and Pinecone Serverless trade blows. Qdrant wins on read-heavy. Serverless wins on write-heavy with cold storage.
- Above 100M, your shortlist shrinks fast. Weaviate gets prohibitive. Pinecone Pod-based caps at 160M. You're choosing between Serverless, Zilliz, or self-hosted Milvus.
The Hidden Costs Nobody Mentions
Replication. Multiply storage and (often) compute by replicas. Replication factor 3 on Weaviate = 3× the dimension cost. On managed cluster providers, HA usually means a multi-node cluster at 2–3× the single-node price.
Filtered queries. Pinecone charges 5–10 RU for filtered vs 1–2 for unfiltered. The same query volume can be 3–5× more expensive if every search includes a WHERE tenant_id = ? clause — which production RAG basically always does.
Re-embeddings. Switching embedding models means re-writing every vector. 10M re-embeds = 10M write units. On Pinecone Serverless that's ~$60. On Chroma it's $25. On Qdrant Cloud it's free. Plan for at least one model migration per year.
Metadata storage. Most calculators assume 1 KB per vector. RAG with full chunk text plus URLs plus permissions can hit 5–10 KB. At 100M vectors that's 500 GB–1 TB extra.
Egress. None of these calculators include egress out of AWS / GCP. If your application server is in a different region than your vector DB, expect $0.02/GB egress. Usually negligible, occasionally not.
Connection overhead. Serverless vector DBs cold-start. The first query after idle can be 200–500 ms before the index loads. For interactive RAG, you'll either ping it constantly to keep it warm or eat the latency.
Index rebuilds. HNSW index parameter changes (M, ef_construction) require a full rebuild. On managed providers this might be a few hours of degraded performance. On self-hosted, it's a few hours of your weekend.
Quantization Changes Everything
Float32 is 4 bytes per dimension. int8 is 1 byte. Binary is 0.125 bytes. The same 10M × 1536 vector store is:
- float32: ~61 GB
- int8: ~15 GB (75% reduction)
- binary: ~1.9 GB (97% reduction)
Recall loss is small if the embedding model supports it well: under 2% for int8, 5–10% for binary on most retrieval benchmarks.
This matters most for:
- Weaviate (per-dim cost) — but Weaviate currently bills on stored dimensions, not bytes. Quantization helps RAM but not bill.
- Self-hosted everything — you can drop one tier and keep the same recall.
- Qdrant — first-class quantization support; the calculator's default formulas don't include it but Qdrant in production typically does.
Pinecone Serverless and Chroma Cloud handle quantization internally and bill you on logical vector count — quantization changes their internal economics, not yours.
Decision Rubric
- Prototype, < 1M vectors, want zero ops: Chroma Cloud, Pinecone Serverless free tier, or Qdrant Cloud's 1 vCPU tier
- Production RAG, 1M–50M vectors, need filtering: Qdrant Cloud (cheap, fast filters) or Pinecone Pod-based (predictable cost)
- Already on MongoDB Atlas: Atlas Vector Search — included free, one less system to operate
- Production RAG, 50M–1B vectors: Pinecone Serverless (if read patterns are favorable) or self-hosted Milvus (if you have the team)
- Cost-sensitive, willing to operate infra: self-hosted Qdrant on r6g (best $/vector at any scale) — but only if you actually have the operational capacity
- High-dimension embeddings (1536+) and budget-sensitive: avoid Weaviate; consider int8/binary quantization on anything else
- Air-gapped or regulated: self-hosted pgvector or Qdrant; pgvector wins if the rest of your stack is Postgres
Common Mistakes
"Self-hosted is free." Engineering time isn't free. A junior who spends 20% of their week patching pgvector, debugging EBS throughput, and recovering from a failed snapshot costs more than $500/month — closer to $2,000–$4,000 fully loaded. Either count it or commit to the platform fee.
"I'll start on Pinecone and migrate if it gets expensive." Migrating 100M vectors with re-embedding takes days and a behavior cut-over you'd rather not do at scale. Pick for the scale you're going to be at in 12 months, not the one you're at today.
"I picked the cheapest provider per the calculator." Cost is one of three axes — the others are recall (does the search return the right results?) and latency (does it return them in time?). A 30% cheaper option that adds 200 ms to every query might be a worse business outcome than the more expensive one.
"My RAG cost is dominated by the vector DB." Almost never true. For most RAG apps, the embedding API and the LLM API together are 5–20× the vector DB bill. Optimize there first. The companion LLM Cost Calculator will show you the math.
FAQ
What's the absolute cheapest way to run vector search?
For under 1M vectors, SQLite with the sqlite-vec extension on a $5 VPS. For larger scale where you want managed, Qdrant Cloud's 1 vCPU / 4 GB tier at ~$36/month is hard to beat below 5M vectors.
Why is Pinecone so popular if it's not the cheapest? Two reasons: it was first to market with serverless vector search, and the developer experience is genuinely excellent (single API endpoint, zero config, predictable behavior). Many teams pay the premium for the time saved. At scale, the premium gets large and people migrate.
Is MongoDB Atlas Vector Search actually competitive? Yes, if you're already on Atlas. Vector search is included free in any M10+ cluster, so the marginal cost of adding it is zero. The main constraint is RAM — at 1536 dims, 10M vectors needs M30 ($285/mo). It's not the right pick if you don't already use Atlas, since you'd be paying for the document database you don't need.
Should I worry about being locked in to a provider? Less than you'd think. Vectors are portable — every provider supports bulk export, and re-embedding is a one-time cost. The lock-in is in the filter syntax and the operational tooling you build around it (monitoring, alerting, backup). Plan for one migration in your tool's lifetime; that's normal.
How often do these prices change?
Slowly but materially. Pinecone has cut Serverless prices twice in the last 18 months. Weaviate restructured to per-dim in 2024. Chroma Cloud's pricing has changed three times since launch. Always check the source URLs the calculator links to before you commit — the data is verified at last_updated but vendors do update.
What about Vespa, OpenSearch k-NN, Elasticsearch dense_vector, Redis VSS, Cassandra SAI? All real options not in this comparison. Vespa is excellent at large scale but operationally complex. OpenSearch / Elasticsearch make sense if you're already running them for full-text search — adding vectors is incremental. Redis VSS is great for ephemeral / caching use. Cassandra SAI is for teams already on Cassandra. Calculator coverage will expand; for now, the 10 providers above cover ~95% of new RAG deployments.