Vector Databases
Vector databases are the retrieval layer that makes production RAG work. The choice between them isn't religious — it's about residency, scale, and ops appetite. pgvector for teams already on Postgres and under ~20M vectors. Pinecone or Weaviate Cloud when managed scaling matters more than self-hosting cost. Qdrant when on-premises or full self-hosting is required. We've shipped all four to production and the deciding factor has been the client's infrastructure context, not the database's marketing. Pair the right vector DB with the right embedding model and the RAG layer effectively disappears — it just works.
What you get
Real examples
pgvector for a production RAG system
Illustrative scenario: an Australian SaaS deploys RAG over their support documentation (~500K chunks). pgvector in their existing Supabase instance handles retrieval; no separate vector DB to operate. Hybrid search via pgvector + pg_trgm for keyword. Total ops surface stays the same as before adding AI.
Pinecone for cross-tenant SaaS at scale
Illustrative scenario: a B2B AI product serves 100+ enterprise clients with isolated document corpora. Pinecone's namespace isolation per tenant + serverless scaling removes the multi-tenant vector-isolation problem. Per-tenant cost scales with usage, not infrastructure provisioning.
Qdrant for sovereign-data healthcare
Illustrative scenario: a healthcare client requires all patient-data retrieval to stay within their VPC. Qdrant deployed on dedicated infrastructure with role-based access controls aligned to their existing identity stack. No cloud vector-DB option; Qdrant's self-hosting story is the strongest in this scenario.
Common questions
Which vector DB do you default to?
pgvector for new projects under ~20M chunks where the client is already on Postgres — fewer moving parts, simpler ops. Pinecone serverless when scale is genuinely large or growth is unpredictable. Qdrant for sovereign-data deployments. Weaviate when the client prefers the open-source option with a managed-cloud fallback. The decision is always driven by infrastructure context, not benchmarks.
Is pgvector really production-ready?
Yes, up to ~20M vectors per index. Above that, latency degrades and the operational story (index rebuilds, replication considerations) gets more complex. For the projects we ship most often, pgvector is the right call. We've migrated clients off Pinecone to pgvector multiple times when their actual scale didn't justify the cost.
Do you use hybrid search (dense + keyword)?
Almost always for production RAG. Pure vector search misses queries with specific terminology (product names, technical acronyms, proper nouns). Hybrid search combines dense vector recall with BM25 / keyword for production-grade accuracy. The 10-20% retrieval improvement is consistently worth the implementation complexity.
What about reranking?
Yes, for any RAG system that needs high precision. Initial vector recall pulls top-50; Cohere Rerank (or a custom cross-encoder) scores and picks top-5 for the LLM context. The latency cost is real (~200-400ms) but the precision lift on real user queries is large.
How do you handle vector DB cost at scale?
Three levers. One, right-size embedding dimensionality — text-embedding-3-small (1536d) is often enough vs text-embedding-3-large (3072d) at half the storage. Two, prune stale vectors aggressively — most RAG corpora have a long tail of low-recall documents. Three, use sparse + dense hybrid where appropriate — keyword indexes are cheap.
Ready to get started?
Tell us about your project and we'll tell you honestly how we can help.
Get in Touch