Comparison10 min read

Vector Database Cost Comparison 2026: Pinecone vs Weaviate vs Qdrant vs Milvus vs pgvector

The same 10M-vector RAG workload can cost $36 or $1,460 a month depending on which provider you pick. Here's how Pinecone, Weaviate, Qdrant, Zilliz, Chroma, MongoDB Atlas, and self-hosted pgvector / Milvus actually compare in 2026.

Vector database pricing is the most confusing line item in any AI infrastructure budget. Pinecone bills per read unit, Weaviate per million dimensions, Qdrant per cluster hour, MongoDB per RAM tier, and pgvector "for free" if you don't count the EC2 bill or the engineer babysitting it. The same 10M-vector RAG workload can cost anywhere from $36 to $1,200 a month depending on which model you map it onto. Here's how the ten serious options actually compare in 2026.

The 10 Contenders

There's no single market — there are three:

Managed serverless / usage-based

  • Pinecone Serverless — pay per GB stored, plus read units (RU) and write units (WU)
  • Chroma Cloud — pay per GB stored, plus per-million queries and writes
  • Weaviate Cloud — pay per million stored dimensions per month

Managed cluster-based (you pick a tier, you pay the hourly rate)

  • Pinecone Pod-based — s1 (storage-optimized) and p2 (QPS-optimized) tiers
  • Qdrant Cloud — vCPU + RAM tiers
  • Zilliz Cloud (Milvus) — Compute Units (CUs); 1 CU ≈ 8 GB RAM ≈ 1.5M 768-dim vectors
  • MongoDB Atlas Vector Search — included free in any M10+ Atlas cluster

Self-hosted on AWS (raw infra + your time)

  • pgvector on r6g EC2 + gp3 EBS
  • Qdrant open-source on r6g EC2
  • Milvus open-source on r6g EC2 (distributed, higher ops overhead)

The pricing models don't compare directly. A serverless RU is not a Qdrant vCPU is not a Milvus CU. The only honest way to compare is to fix a workload and run the math for each model — which is what the Vector Database Cost Calculator does.

Side-by-Side at a Representative RAG Workload

Workload: 10M vectors, 1536 dimensions (OpenAI text-embedding-3-small), 5M queries/month, 500K writes/month, float32, replication factor 1, 1 KB metadata per vector. May 2026 pricing.

Provider Pricing model Estimated monthly cost Notes
pgvector self-hosted (r6g.xlarge) EC2 + EBS ~$148 infra only / ~$648 with DevOps 4 vCPU, 32 GB RAM; you operate it
Qdrant self-hosted (r6g.large) EC2 + EBS ~$74 infra only / ~$574 with DevOps Best raw price-perf
Qdrant Cloud (1 vCPU / 4 GB) Cluster hourly ~$36 Cheapest managed at this scale
Zilliz Cloud (Milvus) Compute Units ~$880 ~7 CUs × $109/mo + storage
Weaviate Cloud Standard Per-dim ~$25 (minimum) – ~$1,460 $0.000095/M dim/mo × 15.36B dims
Pinecone Serverless Storage + RU/WU ~$120–$180 $0.33/GB + read/write units
Pinecone Pod-based s1.x2 Pod hourly ~$140 730 hr × $0.192
Chroma Cloud Storage + queries ~$72 $0.25/GB + $5/M queries + $2.50/M writes
MongoDB Atlas M30 Cluster tier $285 Vector search included; need M30 for 10M × 1536 dims
Milvus self-hosted (r6g.xlarge) EC2 + EBS ~$148 infra / ~$898 with DevOps Higher DevOps because of distributed components

A few things jump out immediately:

  1. Qdrant Cloud is brutal on price at small-to-medium scale. It includes both compute and storage in the hourly rate; no separate query fees.
  2. Weaviate's per-dimension model swings wildly with dimensions. Same vector count at 384 dims (BGE-small) instead of 1536 is 4× cheaper.
  3. Pinecone Serverless wins or loses on QPS, not storage. The numbers change dramatically with queries × RU/query. Filtered queries cost 5–10 RU instead of 1–2.
  4. Self-hosted is "cheap" only if you ignore the engineer. $74/month of EC2 is irrelevant against the $500–750/month of ops time it takes to actually run.

The Pricing Models, Decoded

Understanding which knob blows up your bill is half the battle.

Pinecone Serverless: storage + read units + write units

cost = storage_gb × $0.33
     + (queries × ru_per_query / 1M) × $8.25
     + (writes × wu_per_write / 1M) × $2.00
     + max(0, $50 - subtotal)   ← $50/mo Standard plan minimum

ru_per_query is 2 for unfiltered and 5–10 for filtered queries. If your app does metadata filtering on every query (e.g., "search docs for tenant_id = X"), assume the high end. Filtered RAG can be 3–5× more expensive than unfiltered on the same vector count.

Pinecone Pod-based: just hourly × 730

cost = pod_hourly_rate × 730

s1.x1 is $0.096/hr (~$70/mo) and holds 5M vectors. Predictable, but capped at p2.x8 (~160M vectors). Beyond that, you migrate to Serverless or contact sales.

Weaviate Cloud: per million dimensions

cost = max($25, vectors × dimensions / 1M × $0.000095 × replicas)

The $0.000095 is tiny, but you multiply by vectors × dimensions. 10M × 1536 = 15.36B dimensions = ~$1,460/month at replication 1. The same workload at 768 dims is $730. Weaviate punishes high-dimension embeddings. Picking a 384-dim model (BGE-small) over 1536 (text-embedding-3-small) is a 4× cost cut on Weaviate alone.

Qdrant Cloud: hourly cluster, all-in

cost = tier_hourly × 730

Storage and queries are included; no separate fees. Capacity per tier is RAM-bound — the HNSW index needs to fit in memory. The 4 vCPU / 16 GB tier ($143/mo) handles ~25M vectors at 768 dims. Quantization extends capacity dramatically — int8 cuts RAM ~4×, binary ~32×.

Zilliz Cloud (Milvus): Compute Units

cost = ceil(vectors / 1.5M) × $0.15/hr × 730 + storage_gb × $0.04

1 CU ≈ 8 GB RAM ≈ 1.5M 768-dim float32 vectors. At 1536 dims you double the CU count. Storage is S3-cheap ($0.04/GB), so Zilliz's bill is dominated by compute. Best for large workloads where you want managed Milvus without operating it yourself.

Chroma Cloud: storage + queries + writes

cost = storage_gb × $0.25 + (queries / 1M) × $5 + (writes / 1M) × $2.50

No monthly minimum, pure usage. Cleanest model for prototypes and small deployments. Pricing is still in flux as of May 2026; the calculator pulls from last_verified so check the source link before committing.

MongoDB Atlas: cluster tier ladder

Vector search is free if you're already on M10 or above. The cost is the cluster, not the search. Tiers double in price (M10 $60 → M20 $130 → M30 $285 → M40 $605...). Pick by RAM, since the HNSW index lives in memory. Best when you already use Atlas for your application data.

Self-hosted pgvector / Qdrant / Milvus

infra = ec2_hourly × 730 + ebs_gb × $0.10
total = infra + (devops_overhead ?? 0)

The infrastructure number is small. The number that matters is the second one. The calculator defaults to $500/month for pgvector and Qdrant, $750 for Milvus (distributed). Toggle DevOps overhead off to see raw infra; on for true total cost of ownership.

How Cost Scales

Same Workload, different vector counts. 1536 dims, 1M queries/month, replication 1:

Vectors Pinecone Serverless Qdrant Cloud Weaviate Cloud pgvector self-hosted (with ops)
100K $50 (minimum) $36 $25 (minimum) ~$575
1M $50–60 $36 $146 ~$575
10M $120–180 $143 (4 vCPU tier) $1,460 ~$648
100M $400–800 $571 (16 vCPU) $14,600 ~$1,150 + sharding
1B $3K–6K $2,300+ $146,000 (impractical) needs sharding

Three patterns:

  1. Below ~1M vectors, the minimums dominate. Don't agonize over pricing — pick the one with the easiest DX (Chroma, Qdrant Cloud, or Pinecone Serverless free tier).
  2. From 1M to 100M, Qdrant and Pinecone Serverless trade blows. Qdrant wins on read-heavy. Serverless wins on write-heavy with cold storage.
  3. Above 100M, your shortlist shrinks fast. Weaviate gets prohibitive. Pinecone Pod-based caps at 160M. You're choosing between Serverless, Zilliz, or self-hosted Milvus.

The Hidden Costs Nobody Mentions

Replication. Multiply storage and (often) compute by replicas. Replication factor 3 on Weaviate = 3× the dimension cost. On managed cluster providers, HA usually means a multi-node cluster at 2–3× the single-node price.

Filtered queries. Pinecone charges 5–10 RU for filtered vs 1–2 for unfiltered. The same query volume can be 3–5× more expensive if every search includes a WHERE tenant_id = ? clause — which production RAG basically always does.

Re-embeddings. Switching embedding models means re-writing every vector. 10M re-embeds = 10M write units. On Pinecone Serverless that's ~$60. On Chroma it's $25. On Qdrant Cloud it's free. Plan for at least one model migration per year.

Metadata storage. Most calculators assume 1 KB per vector. RAG with full chunk text plus URLs plus permissions can hit 5–10 KB. At 100M vectors that's 500 GB–1 TB extra.

Egress. None of these calculators include egress out of AWS / GCP. If your application server is in a different region than your vector DB, expect $0.02/GB egress. Usually negligible, occasionally not.

Connection overhead. Serverless vector DBs cold-start. The first query after idle can be 200–500 ms before the index loads. For interactive RAG, you'll either ping it constantly to keep it warm or eat the latency.

Index rebuilds. HNSW index parameter changes (M, ef_construction) require a full rebuild. On managed providers this might be a few hours of degraded performance. On self-hosted, it's a few hours of your weekend.

Quantization Changes Everything

Float32 is 4 bytes per dimension. int8 is 1 byte. Binary is 0.125 bytes. The same 10M × 1536 vector store is:

  • float32: ~61 GB
  • int8: ~15 GB (75% reduction)
  • binary: ~1.9 GB (97% reduction)

Recall loss is small if the embedding model supports it well: under 2% for int8, 5–10% for binary on most retrieval benchmarks.

This matters most for:

  • Weaviate (per-dim cost) — but Weaviate currently bills on stored dimensions, not bytes. Quantization helps RAM but not bill.
  • Self-hosted everything — you can drop one tier and keep the same recall.
  • Qdrant — first-class quantization support; the calculator's default formulas don't include it but Qdrant in production typically does.

Pinecone Serverless and Chroma Cloud handle quantization internally and bill you on logical vector count — quantization changes their internal economics, not yours.

Decision Rubric

  • Prototype, < 1M vectors, want zero ops: Chroma Cloud, Pinecone Serverless free tier, or Qdrant Cloud's 1 vCPU tier
  • Production RAG, 1M–50M vectors, need filtering: Qdrant Cloud (cheap, fast filters) or Pinecone Pod-based (predictable cost)
  • Already on MongoDB Atlas: Atlas Vector Search — included free, one less system to operate
  • Production RAG, 50M–1B vectors: Pinecone Serverless (if read patterns are favorable) or self-hosted Milvus (if you have the team)
  • Cost-sensitive, willing to operate infra: self-hosted Qdrant on r6g (best $/vector at any scale) — but only if you actually have the operational capacity
  • High-dimension embeddings (1536+) and budget-sensitive: avoid Weaviate; consider int8/binary quantization on anything else
  • Air-gapped or regulated: self-hosted pgvector or Qdrant; pgvector wins if the rest of your stack is Postgres

Common Mistakes

"Self-hosted is free." Engineering time isn't free. A junior who spends 20% of their week patching pgvector, debugging EBS throughput, and recovering from a failed snapshot costs more than $500/month — closer to $2,000–$4,000 fully loaded. Either count it or commit to the platform fee.

"I'll start on Pinecone and migrate if it gets expensive." Migrating 100M vectors with re-embedding takes days and a behavior cut-over you'd rather not do at scale. Pick for the scale you're going to be at in 12 months, not the one you're at today.

"I picked the cheapest provider per the calculator." Cost is one of three axes — the others are recall (does the search return the right results?) and latency (does it return them in time?). A 30% cheaper option that adds 200 ms to every query might be a worse business outcome than the more expensive one.

"My RAG cost is dominated by the vector DB." Almost never true. For most RAG apps, the embedding API and the LLM API together are 5–20× the vector DB bill. Optimize there first. The companion LLM Cost Calculator will show you the math.

FAQ

What's the absolute cheapest way to run vector search? For under 1M vectors, SQLite with the sqlite-vec extension on a $5 VPS. For larger scale where you want managed, Qdrant Cloud's 1 vCPU / 4 GB tier at ~$36/month is hard to beat below 5M vectors.

Why is Pinecone so popular if it's not the cheapest? Two reasons: it was first to market with serverless vector search, and the developer experience is genuinely excellent (single API endpoint, zero config, predictable behavior). Many teams pay the premium for the time saved. At scale, the premium gets large and people migrate.

Is MongoDB Atlas Vector Search actually competitive? Yes, if you're already on Atlas. Vector search is included free in any M10+ cluster, so the marginal cost of adding it is zero. The main constraint is RAM — at 1536 dims, 10M vectors needs M30 ($285/mo). It's not the right pick if you don't already use Atlas, since you'd be paying for the document database you don't need.

Should I worry about being locked in to a provider? Less than you'd think. Vectors are portable — every provider supports bulk export, and re-embedding is a one-time cost. The lock-in is in the filter syntax and the operational tooling you build around it (monitoring, alerting, backup). Plan for one migration in your tool's lifetime; that's normal.

How often do these prices change? Slowly but materially. Pinecone has cut Serverless prices twice in the last 18 months. Weaviate restructured to per-dim in 2024. Chroma Cloud's pricing has changed three times since launch. Always check the source URLs the calculator links to before you commit — the data is verified at last_updated but vendors do update.

What about Vespa, OpenSearch k-NN, Elasticsearch dense_vector, Redis VSS, Cassandra SAI? All real options not in this comparison. Vespa is excellent at large scale but operationally complex. OpenSearch / Elasticsearch make sense if you're already running them for full-text search — adding vectors is incremental. Redis VSS is great for ephemeral / caching use. Cassandra SAI is for teams already on Cassandra. Calculator coverage will expand; for now, the 10 providers above cover ~95% of new RAG deployments.

Try the tools