When to Use pgvector vs a Dedicated Vector Database

"Do I really need a vector database, or can pgvector handle this?" is the most common question on every AI engineering Slack in 2026 — and the honest answer is "pgvector handles more than you think, until it suddenly doesn't." The break point is not where most blog posts claim it is. Here is when pgvector is the right call, when it absolutely isn't, and how to tell the difference before you commit to either path.

TL;DR Decision Matrix

Your situation	Use pgvector	Use a dedicated vector DB
Already running Postgres for app data	✅ default	❌ unless you hit a constraint below
Under 10M vectors	✅ comfortable fit	overkill
10M – 50M vectors, low-medium QPS	✅ with HNSW + a beefy instance	✅ either works
50M – 100M vectors	⚠️ workable but operationally hairy	✅ better fit
100M+ vectors	❌ sharding required, painful	✅
Need filtered search across many tenants	⚠️ row-level security gets slow	✅ specialty engines optimize this
Need hybrid sparse + dense retrieval	❌ pgvector lacks first-class sparse	✅ Pinecone, Weaviate, Qdrant
Need real-time index updates with no downtime	⚠️ HNSW maintenance locks	✅
Want serverless billing tied to usage	❌ Postgres is always on	✅ Pinecone Serverless, Chroma Cloud
Need separate scaling for vector search	❌ shares CPU/RAM with OLTP	✅
Already paying $2K+/mo for managed VDB	⚠️ self-hosted pgvector saves cash if you have ops	⚠️ depends on team

The rest of this post explains why each row says what it says.

What pgvector Actually Is in 2026

pgvector is a Postgres extension that adds three things to vanilla Postgres:

A vector column type — fixed-dimension float arrays stored efficiently
Distance operators — <-> (L2), <#> (negative inner product), <=> (cosine)
Index types — ivfflat (older, partition-based) and hnsw (modern, graph-based, default since 0.5)

pgvector 0.7 (the most-used 2026 version) added meaningful improvements: parallel index builds, scalar quantization (int8 storage with float32 query precision), and significantly faster HNSW build times. The 0.8 dev branch experimentally supports binary quantization.

That's the entire engine. Everything else — connection pooling, replication, point-in-time recovery, monitoring, backups — is whatever Postgres already gives you. This is the central design philosophy: pgvector is not trying to compete with Pinecone. It's adding vector search to the database you already operate.

What pgvector Is Great At

1. You already have Postgres

This is the dominant reason to pick pgvector. If your application database is already Postgres — and for the average startup that's overwhelmingly true — adding pgvector is CREATE EXTENSION vector; and you're done. You inherit:

Your existing connection pool, IAM, and access patterns
Your existing backup, snapshot, and disaster recovery setup
Your existing monitoring and alerting
Your existing schema migration tooling
Joins between vectors and your application tables, transactionally

That last point is the killer feature nobody mentions. With a separate vector DB, every query becomes a two-hop dance: hit the vector DB, get back IDs, hit your Postgres for the actual data. With pgvector, it's a single SQL query joining a vector index to whatever else you store. Operational simplicity that's hard to give up once you have it.

2. Joins, filters, and ACID transactions on the same data

SELECT d.id, d.title, d.tenant_id, d.created_at,
       (d.embedding <=> '[0.1, 0.2, ...]'::vector) AS distance
FROM documents d
WHERE d.tenant_id = 42
  AND d.archived_at IS NULL
  AND d.created_at > NOW() - INTERVAL '30 days'
ORDER BY distance
LIMIT 10;

That's an arbitrary filter combined with vector search in one transaction-consistent query. To do this on Pinecone you'd send a filter dict, pray the metadata index covers it, hope filtered queries don't 5–10× your Read Unit consumption, then make a follow-up call to Postgres for the rich row data.

For a multi-tenant SaaS where every query has a WHERE tenant_id = X clause, this advantage compounds enormously.

3. Cost — at the right scale

pgvector "costs nothing" if you already pay for the Postgres instance. For workloads under ~50M vectors, you can usually fit them onto the same instance you run application data on. The honest economics are detailed below, but at small-to-medium scale, the marginal cost of vector search on existing Postgres is genuinely close to zero.

The Vector Database Cost Calculator lets you compare the self-hosted-pgvector line against managed alternatives. The crossover point usually lands somewhere around $200–500/month of equivalent managed VDB cost — below that, pgvector is clearly cheaper; above that, the operational overhead starts to matter.

4. The data team can use it

Your analysts and data engineers already speak SQL. They can query the vector data, build dashboards on top of it, and run ad-hoc analysis without learning a new query language or installing a new SDK. Pinecone's REST API is fine, but it's not a thing your BI team will pick up casually.

Where pgvector Breaks Down

1. Index build time and memory pressure

HNSW indexes are expensive to build. For 10M vectors at 1536 dimensions, expect roughly 30–60 minutes on a r6g.4xlarge with parallel builds. For 100M vectors, you're looking at half a day. During the build, the database is heavily loaded — concurrent OLTP traffic suffers.

Worse: HNSW indexes want to live in RAM for query latency to be acceptable. A 100M × 1536-dim float32 index is ~600 GB raw and ~20–30% larger with HNSW graph overhead. That's an instance with 1 TB+ of RAM, which on AWS is r6g.16xlarge or r7g.16xlarge — $2,000–3,000/mo on-demand. You can move to int8 quantization (pgvector 0.7+) and quarter the RAM requirement, but this is exactly where dedicated vector DBs (which apply quantization automatically) start to shine.

2. HNSW maintenance is heavy

When you INSERT or UPDATE a vector, the HNSW graph needs to be updated. pgvector handles this incrementally, but bulk updates can lock significantly. There's no online "rebuild on the side, swap atomically" path — that requires creating a parallel index and dropping the old one, which doubles your storage during the transition.

If your corpus changes frequently — daily re-embedding, real-time content ingestion — you'll feel this pain. Specialty vector DBs (Pinecone, Qdrant, Weaviate) handle this much better because they're designed around vector mutability as a first-class concern.

3. Filtered search at scale

This is the subtle one. pgvector with HNSW supports filtering, but the query planner has limited insight into the interaction between the vector index and your WHERE clauses. There are basically two paths:

Pre-filter — apply the WHERE first, then exact-distance scan the survivors. Fast when the filter is highly selective (e.g., tenant_id = 42 returning 1K rows). Slow when the filter is broad (e.g., "all docs from the last year").
Post-filter — search the vector index for the top K, then apply WHERE. Fast when the filter is broad. Returns fewer than K results when the filter is selective.

The planner's choice between these is heuristic-driven and not always optimal. Specialty engines (Qdrant especially, with its payload index) have spent years on filter-aware vector search. At high QPS with complex filters, this gap is visible.

4. No built-in hybrid sparse + dense

Hybrid search (combining BM25-style keyword scores with vector similarity) is the dominant retrieval pattern in 2026 production RAG. Pinecone, Weaviate, Qdrant, and Milvus all have first-class support. pgvector handles it through a separate ts_vector column and tsquery plus a hand-tuned weighted score — workable, but boilerplate-heavy and slow at scale.

If your retrieval quality matters more than your operational simplicity, this is a real gap.

5. You're stealing CPU and RAM from your OLTP workload

Vector search is CPU-intensive. HNSW traversal is cache-unfriendly and burns substantial cycles. If your Postgres is also serving high-QPS application queries, vector workloads will compete for resources during peak times. You can mitigate this with read replicas dedicated to vector search, but at that point you're operating two database fleets — and the simplicity argument starts to weaken.

6. No serverless billing model

Pinecone Serverless and Chroma Cloud charge you per query and per GB stored. If your traffic is bursty (10K queries during business hours, ~zero overnight), serverless can be much cheaper than always-on Postgres. pgvector is always running, always paying for the instance, regardless of whether anyone is querying.

What Dedicated Vector DBs Offer That pgvector Doesn't

Briefly, since the Vector Database Cost Comparison 2026 covers the depth:

Automatic quantization (Pinecone, Zilliz, Weaviate) — int8 and product quantization applied transparently. You don't tune it; it just works.
First-class hybrid search — sparse + dense retrieval as a single API call.
Filter-aware indexing — payload indexes, attribute filters, per-tenant routing optimized for multi-tenant SaaS.
Distributed sharding — handles billion-vector indices without you writing the sharding logic.
Online index rebuilds — switch dimensions or distance metrics without downtime.
Streaming updates — real-time vector ingest with no maintenance lock.
Specialty distance functions — cosine, dot, L2, Hamming, Jaccard.
GPU acceleration (Milvus, Zilliz) — for very high QPS or very large indices.
Serverless billing (Pinecone, Chroma) — pay per usage, not per provisioned instance.
Built-in observability — per-query latency, recall metrics, hit rates as part of the dashboard.

You can replicate most of these on top of Postgres given enough engineering effort. The question is whether that effort is well-spent versus paying $50–500/month to a vendor who's already solved it.

The Cost Dimension

The intuition that "pgvector is free" is misleading. It's free of API fees. It's not free of:

The Postgres instance you're running it on (more RAM and CPU than your OLTP workload alone needs)
The storage for vectors and the HNSW graph (typically 1.2–1.5× the raw vector size)
The engineer time to build, monitor, and debug it
The opportunity cost of higher recall and lower latency from purpose-built engines

The honest break-even, very roughly:

Self-hosted pgvector beats managed VDBs below ~$500/mo of equivalent managed cost, when you already have a Postgres instance running.
Self-hosted pgvector ties or loses in the $500–2,000/mo range, depending on whether you have a platform team.
Self-hosted pgvector loses above $2,000/mo because the operational overhead of running a stateful service yourself becomes the dominant line item.

The Vector Database Cost Calculator lets you sanity-check this against your specific workload. Toggle "Include DevOps overhead" to compare honestly — without it, self-hosted always looks cheaper than it really is.

For a full RAG-stack comparison (embedding + storage + retrieval + generation), the RAG Cost Calculator shows where vector storage actually fits in your overall bill. Spoiler: for almost every realistic RAG workload, the LLM generation line is 70–95% of the total. Picking the cheapest vector DB saves rounding error compared to picking the right LLM. Optimize for operational fit, not vector-DB list price.

A Decision Framework

Walk down this list:

1. Are you already running Postgres in production for application data?

No → use a managed vector DB. The marginal cost of running Postgres just for vectors is silly.
Yes → go to step 2.

2. Will you have more than 50M vectors within the next 12 months?

Yes → use a dedicated vector DB. pgvector at this scale is workable but operationally painful.
No → go to step 3.

3. Is your retrieval quality the dominant business concern (legal, medical, financial, search-as-product)?

Yes → use a dedicated vector DB. Hybrid search, automatic quantization, and reranker integrations are too valuable.
No → go to step 4.

4. Is your traffic bursty (high during business hours, near-zero overnight)?

Yes → consider Pinecone Serverless or Chroma Cloud. Always-on Postgres is wasteful for this pattern.
No → go to step 5.

5. Do you need filtered search where filters are highly selective and per-tenant?

Yes → both work; specialty engines (Qdrant especially) handle this better at scale.
No → go to step 6.

6. Default to pgvector.

This routes the average startup to pgvector, which is correct. It also routes scale-up companies and quality-sensitive applications to specialty engines, which is also correct.

Migration Paths

pgvector → dedicated vector DB

When the time comes, migration is mechanically simple:

Set up the vector DB and create the index/collection.
Bulk-export vectors from Postgres: COPY (SELECT id, embedding, metadata FROM documents) TO '/tmp/vectors.csv'.
Bulk-import into the target system.
Dual-write new vectors to both for the cutover window.
Switch read traffic; verify recall and latency.
Drop pgvector.

The hard part isn't moving the vectors — it's reworking your application code to make two-hop calls (vector DB for IDs, Postgres for data). Plan for 1–2 sprints of refactoring.

Dedicated vector DB → pgvector (the rare case)

Less common but real, usually triggered by:

Surprise vendor bills at scale
Need for transactional consistency between vectors and app data
Hiring a platform team that wants to own the stack

The migration is symmetric: bulk export, bulk import. The hard part this time is rebuilding any vendor-specific features (hybrid search, attribute filters, automatic quantization) that pgvector doesn't have.

The 2026 pgvector Update Worth Knowing

Three things have changed since the "pgvector can't scale" takes from 2023–2024:

1. HNSW with parallel build (0.6+) and incremental insert performance (0.7) are dramatically better. Build times that took 4 hours in 2023 take 45 minutes in 2026 on the same hardware.

2. Scalar quantization (0.7+) cuts storage 4× with negligible recall loss. This pushes the practical pgvector ceiling from ~10M vectors to ~50M vectors on commodity hardware.

3. Postgres 17 (released 2024-09) added improvements to parallel index scans and adaptive query planning that materially help vector workloads — better filter+vector planning, more aggressive parallelism on HNSW traversal.

The combination is meaningful: workloads that needed Pinecone in 2023 often run comfortably on pgvector + Postgres 17 in 2026. If you last evaluated pgvector before 2024, re-evaluate.

What About Other Postgres-Compatible Options?

Three worth knowing:

pgvector-rs — a Rust rewrite of pgvector with broadly the same feature set and faster index builds. Less battle-tested but worth watching.
Timescale's tsvector extensions — tuned specifically for time-series vector workloads (logs, traces, time-stamped embeddings).
Supabase pgvector — managed Postgres with pgvector preinstalled. The same engine, but you don't operate it. Bridges the "we want pgvector but not the ops burden" gap nicely.

If your need is "vectors in Postgres, but I don't want to be a DBA," Supabase or any managed Postgres provider with pgvector enabled (Neon, Crunchy Bridge, RDS) is the right call.

A Final Word on Honesty

The strongest argument for pgvector in 2026 isn't technical — it's operational. Every additional system you operate adds a real cost in monitoring, alerting, on-call rotation, security review, and migration risk. The case for pgvector is "fewer moving parts." The case for a dedicated vector DB is "purpose-built for the workload." Both are valid. Most teams would benefit from defaulting to pgvector and only moving when a specific constraint binds — not preemptively because someone on Hacker News said specialty databases are inevitable.

Run your specific workload through the Vector Database Cost Calculator and the RAG Cost Calculator before committing either way. The numbers will surprise you about which constraints actually bind for your situation.