A vector database stores data as lists of numbers called embeddings, then finds the entries that are closest in meaning to whatever you ask for. That is the whole idea. Where a traditional database matches exact values (“find rows where country = ‘France'”), a vector database matches concepts — “find the paragraphs that talk about the same thing as this question,” even when none of the words overlap.
That capability is the engine room of almost every serious AI feature shipping in 2026: chatbots that cite your documents, semantic search, recommendation systems, and especially retrieval-augmented generation. This guide explains what a vector database actually is, how embeddings and similarity search work under the hood, the six options most teams evaluate, and — just as important — when you don’t need one at all.
Key takeaways
- It searches by meaning, not keywords. A vector database turns text, images, or audio into embeddings and retrieves the nearest ones using similarity math like cosine similarity.
- The core trick is approximate nearest-neighbor search. Algorithms like HNSW find “close enough” matches in milliseconds across millions of vectors instead of comparing every single one.
- RAG is the killer use case. Vector retrieval is how you ground an LLM in your own data without retraining it.
- The 2026 field splits three ways: managed (Pinecone), open-source engines (Qdrant, Weaviate, Milvus, Chroma), and “just add it to Postgres” (pgvector).
- You often don’t need a dedicated one. Under ~10 million vectors and already on PostgreSQL? pgvector usually matches the specialists with far less operational overhead.
What a vector database actually is
To a computer, the sentence “the cat sat on the mat” is meaningless text. An embedding model — a neural network trained for the job — converts that sentence into a fixed-length list of numbers, often 768, 1,024, or 1,536 of them. Each number captures some learned dimension of meaning. The result is a point in high-dimensional space, and the useful property is this: sentences that mean similar things land near each other, while unrelated ones land far apart. “The kitten rested on the rug” ends up close to our cat sentence even though they share almost no words.
A vector database is purpose-built to store millions or billions of these points and answer one question fast: which stored vectors are closest to this query vector? It bundles together the index that makes that search efficient, metadata filtering (so you can say “nearest results, but only from 2025”), and the storage and scaling machinery to keep it all running. If you want the broader context of how these pieces fit into AI systems, our beginner’s guide to machine learning covers the embedding models that feed the database in the first place.
Embeddings and similarity, briefly
“Closest” needs a definition. The most common metric for text is cosine similarity, which measures the angle between two vectors and ignores their length. It ranges from -1 (opposite meaning) to 1 (identical direction), and because most modern embedding models output normalized, unit-length vectors, cosine similarity ends up mathematically equivalent to the faster dot product. Euclidean distance is the other option you’ll see, useful when magnitude actually carries information. For typical RAG and semantic-search work, cosine similarity is the sensible default and the one most databases use out of the box.
How similarity search works at scale
Here’s the catch. Comparing your query against every stored vector — a brute-force scan — gives perfect results but collapses under load. At 10 million vectors, checking each one for every query is far too slow for an interactive app. So vector databases use approximate nearest neighbor (ANN) search: they accept being 95–99% correct in exchange for being orders of magnitude faster.
The dominant ANN method in 2026 is HNSW (Hierarchical Navigable Small World), introduced by Yury Malkov and Dmitry Yashunin in a 2016 paper. It builds a layered graph — think of it as a skip list crossed with a road network. The top layer is sparse, with a few nodes connected by long-range “highways”; each layer below adds more nodes and shorter local roads, and the bottom layer contains every vector. A search drops in at the top, takes long hops to get into the right neighborhood, then descends through finer layers to home in on the nearest matches. For data that fits in memory, HNSW consistently delivers the best recall-versus-latency tradeoff, which is why nearly every engine here implements it.
The other half of the scaling story is quantization — compressing vectors so more of them fit in RAM. Techniques range from scalar and product quantization to aggressive 1-bit methods. Milvus’s RaBitQ implementation, for example, reports cutting memory use by around 72% (paired with an SQ8 refinement step) while holding recall near 95%. That compression is what makes billion-scale search affordable.
The top vector databases in 2026
The market sorts into three camps: fully managed services, self-hostable open-source engines, and the Postgres extension that quietly ate a huge chunk of the low end. Here’s how the main options compare, with details verified against current mid-2026 sources.
| Database | Model / License | Built in | Best fit | 2026 notes |
|---|---|---|---|---|
| Pinecone | Proprietary, fully managed | Closed-source engine | Teams who want zero ops | Serverless billing (read/write/storage units); Inference + Assistant; BYOC in public preview for Enterprise on AWS/GCP/Azure |
| Qdrant | Open source (Apache 2.0) | Rust | Performance-sensitive self-hosters | Qdrant Cloud added GPU-accelerated indexing, Multi-AZ clusters and audit logging in April 2026 |
| Weaviate | Open source (BSD-3-Clause) | Go | Built-in hybrid search | Native BM25 + vector + filters in one query; HNSW is the default index, with vectors up to 65,535 dims |
| Milvus | Open source (Apache 2.0) | Go + C++ | Billion-scale workloads | v2.6.x GA on Zilliz Cloud; RaBitQ 1-bit quantization (~72% less memory); graduated LF AI & Data project |
| Chroma | Open source (Apache 2.0) | Rust + Python | Prototypes & small apps | Embeds in-process; Chroma Cloud is serverless but single-node is strongest to ~5–10M vectors |
| pgvector | Open source (Postgres extension) | C | Already on Postgres, <10M vectors | v0.8 added iterative index scans that fix over-filtering; HNSW + IVFFlat |
Managed: Pinecone
Pinecone is the pay-someone-else-to-run-it option. Its serverless architecture lets you store billions of vectors without provisioning servers, and you’re billed on read units, write units, and storage rather than fixed nodes — which tends to suit bursty RAG traffic that goes quiet overnight. Pricing in 2026 runs from a free Starter tier through a flat $20/month Builder plan up to Standard (around $50/month minimum) and Enterprise (around $500/month minimum), with serverless usage billed roughly at $4 per million write units, $16 per million read units, and $0.33/GB/month of storage. The platform has expanded past pure storage into Pinecone Inference (hosted embedding and reranking) and Assistant for agent-style apps, with Bring Your Own Cloud now in public preview for Enterprise customers.
Pinecone strengths
- No infrastructure to manage; strong multi-tenant isolation and SLAs
- Scales to billions of vectors without re-architecting
- Embedding and reranking built into the same platform
Pinecone trade-offs
- Proprietary — no self-hosting, real lock-in risk
- Usage-based bills can surprise you under heavy read/write traffic
- Less low-level control than running your own engine
Open source: Qdrant, Weaviate, Milvus, Chroma
If you’d rather own the stack, the open-source field is strong. Qdrant, written in Rust, is the performance favorite — fast, memory-safe, with rich quantization options and a 2026 batch of enterprise features (GPU indexing, Multi-AZ, audit logs landed on Qdrant Cloud in April). Weaviate, written in Go, leads on hybrid search: it blends keyword (BM25) and vector retrieval with metadata filters in a single query, which is genuinely useful when exact terms and fuzzy meaning both matter. Milvus, a Go-and-C++ project from Zilliz and a graduated LF AI & Data project, is the one you reach for at the extreme high end — its architecture targets billion-vector scale and its RaBitQ quantization keeps that affordable. Chroma sits at the opposite pole: it runs in-process, gets you from zero to a working index in minutes, and is ideal for prototyping, though its sweet spot stays around 5–10 million vectors per node.
Rough throughput from mid-2026 reports puts these in perspective — Qdrant and Weaviate commonly land in the tens of thousands of queries per second, and Milvus can push past 100K QPS at scale — but real numbers depend heavily on dimensions, hardware, and recall targets, so benchmark on your own data before trusting any single figure.
The Postgres route: pgvector
pgvector is the most important entry on this list for the simple reason that it isn’t a separate database at all — it’s an extension that adds vector columns and ANN indexing to PostgreSQL. Your embeddings live in the same table as your relational data, queryable in one SQL statement and one transaction. Version 0.8 closed most of the remaining gaps, adding iterative index scans that fix the old over-filtering problem where a WHERE clause could starve a vector search of results. It supports both HNSW and IVFFlat indexes and is used in production by large teams. The pitch is operational: one system to run, back up, and monitor instead of two.
When you actually need a vector database (and when you don’t)
This is the question too many teams skip. A dedicated vector database is real infrastructure — another service to deploy, secure, scale, and pay for. You should reach for one when you genuinely need its strengths.
You probably do need a dedicated engine when you’re past roughly 5–10 million vectors, require sub-10ms p99 latency at high query volume, depend on advanced hybrid search, or are building a multi-tenant product where isolation and horizontal scaling matter. At that scale the specialists pull clearly ahead.
You probably don’t when you’re under about a million vectors, already run PostgreSQL, and your latency needs are measured in tens of milliseconds rather than single digits. The 2026 consensus is blunt: below ~10 million vectors, pgvector matches or beats the dedicated options on the metrics that matter for most apps, and wins outright on operational simplicity. Start there, and graduate to a specialized database only when you hit a wall you can measure. The same logic applies to a bigger architectural fork — before standing up any retrieval stack, it’s worth weighing fine-tuning versus RAG to confirm retrieval is even the right tool for your problem.
How vector databases power RAG
The reason any of this matters to most builders is retrieval-augmented generation. An LLM only knows what it was trained on, and it can’t see your internal docs, last week’s tickets, or your product catalog. RAG fixes that: you embed your documents into a vector database ahead of time, then at query time you embed the user’s question, retrieve the handful of most-similar chunks, and feed them to the model as context. The LLM answers from real, current, source-grounded material instead of guessing.
The vector database is the retrieval layer in that loop, and its quality sets a ceiling on the whole system — bad retrieval means bad answers, no matter how good the model is. If you want to see the full loop assembled end to end, our walkthrough on building a RAG pipeline puts the database in its proper place alongside chunking, embedding, and the generation step.
FAQ
Is a vector database the same as a regular database?
No. A relational or document database is built for exact and structured queries — matching IDs, ranges, and field values. A vector database is built to find items by semantic similarity using high-dimensional embeddings. Many systems, like pgvector, now bolt vector search onto a traditional database so you get both in one place.
Do I need a vector database for RAG?
You need vector search for RAG, but not necessarily a dedicated vector database. For small-to-medium corpora, pgvector inside your existing Postgres handles retrieval fine. A standalone engine like Pinecone or Qdrant earns its keep once you scale past millions of documents or need very low latency.
What is HNSW and why does it matter?
HNSW (Hierarchical Navigable Small World) is the most widely used approximate-nearest-neighbor index. It builds a layered graph that lets a search jump quickly into the right region of vector space and then refine, returning near-perfect results in milliseconds. It matters because it’s what makes similarity search fast enough to use in real time.
Is cosine similarity better than Euclidean distance?
For text embeddings, cosine similarity is usually the right default because it compares direction (meaning) rather than magnitude. When embeddings are normalized to unit length — as most modern models output — cosine similarity, dot product, and Euclidean distance rank results identically, so the choice often comes down to compute efficiency.
Which vector database is best for beginners?
Chroma and pgvector are the friendliest starting points. Chroma runs in-process with almost no setup, ideal for a first prototype. pgvector is best if you already use PostgreSQL, since it adds vector search without introducing a new system to learn.
How much do vector databases cost in 2026?
The open-source engines — Qdrant, Weaviate, Milvus, Chroma, pgvector — are free to self-host; you pay only for the hardware. Managed tiers start free and climb in steps (Pinecone’s Builder plan is a flat $20/month, Standard around $50/month, Enterprise around $500/month) into enterprise contracts for production scale, where usage-based billing can vary widely with your read and write volume.
Can I use a vector database for images or audio, not just text?
Yes. Any data an embedding model can encode — images, audio, video, code — becomes a vector you can store and search by similarity. The database doesn’t care what the vectors represent; it only does the math. Multi-modal retrieval (searching across text and images together) is increasingly common in 2026.
Bottom line
A vector database is the part of an AI stack that retrieves information by meaning, and in 2026 it’s no longer exotic — it’s standard plumbing for RAG, semantic search, and recommendations. The honest advice is to resist over-engineering. If you’re already on Postgres and under roughly 10 million vectors, start with pgvector and you’ll likely never need more. When you do outgrow it — billions of vectors, single-digit-millisecond latency, heavy hybrid search — the open-source specialists (Qdrant, Weaviate, Milvus) and the fully managed Pinecone are all mature, well-funded, and ready. Choose based on your real scale and ops appetite, not the hype, and benchmark on your own data before you commit.
