What is a vector database and how does it differ from a traditional database?

A vector database stores high-dimensional numerical vectors (embeddings) and specialises in finding the most similar vectors to a query vector using approximate nearest-neighbour search. Traditional databases find exact matches using B-tree or hash indexes. Vector databases power semantic search — finding content that means the same thing rather than containing the same words.

What are embeddings and why are they stored in vector databases?

Embeddings are numerical representations of text, images, or audio produced by a machine learning model. Semantically similar content produces embeddings that are close in the vector space. Storing these in a vector database with an ANN index allows applications to find similar content at scale — the foundation of semantic search and RAG pipelines.

What are the most popular vector databases?

Pinecone is the leading managed cloud vector database. Weaviate, Qdrant, and Milvus are popular open-source options that can be self-hosted or used as managed services. ChromaDB is popular for development and prototyping. pgvector adds vector search to PostgreSQL. The right choice depends on scale, existing infrastructure, and operational complexity tolerance.

What is a Vector Database? Complete Beginner's Guide

What is a Vector Database?

A vector database is a database built specifically to store, index, and query vector embeddings - numerical representations of meaning produced by machine learning models. Unlike SQL databases that search for exact values, vector databases find the K most semantically similar items to a query using Approximate Nearest Neighbour search algorithms like HNSW. They are the essential data layer for RAG pipelines, semantic search, recommendation engines, and AI agent memory systems.

You have spent years learning SQL. You know how to write a WHERE name = 'invoice' query. You understand indexes, joins, and foreign keys. And now every AI tutorial is telling you to throw all of that away and use something called a vector database.

What even is a vector database? Why does RAG need one? Why can't you just store your documents in Postgres and search with LIKE '%invoice%'?

This guide answers all of that. By the end, you will understand what vectors are, what vector databases do differently to every other database you have used, why AI applications depend on them, and how to run your first similarity search in Python.

The Problem with Traditional Databases

Before explaining what a vector database is, it helps to understand what it solves.

Imagine you have a knowledge base of 50,000 support articles. A user types: "my laptop won't turn on after the update."

A traditional SQL query would search for articles containing those exact words. If an article says "device fails to boot following a software patch" - it is the most relevant article in your database. But SQL LIKE search will miss it entirely because not a single word matches.

This is the exact-match problem. Traditional databases store and search structured data: names, dates, IDs, numbers. They are extraordinarily good at finding "all orders placed by customer 4821 after 2025-01-01." They are terrible at finding "documents that mean roughly the same thing as this sentence."

Meaning and intent do not live in keywords. They live in vectors.

What is a Vector (and What is a Vector Embedding)?

A vector in mathematics is just a list of numbers. A 3-dimensional vector might look like [0.2, -0.8, 0.5]. A 1,536-dimensional vector (which is what OpenAI's embedding models produce) looks like [0.012, -0.743, 0.221, 0.008, ... 1,536 numbers total].

A vector embedding is a vector that has been produced by a machine learning model from some piece of content - a sentence, a paragraph, an image, an audio clip, a product listing. The critical property that makes this useful is:

Content that is semantically similar will produce vectors that are mathematically close to each other.

"My laptop won't turn on" and "device fails to boot" - two completely different strings - will produce vectors that are very close in 1,536-dimensional space. "My dog loves playing fetch" will produce a vector that is far away from both.

The embedding model has learned the meaning of language (or images, or audio) and compressed it into a dense numerical representation. This is how you turn the fuzzy concept of "meaning" into something a computer can measure precisely.

What is a Vector Database?

A vector database is a database built specifically to store, index, and query vector embeddings at scale.

You can think of it like this:

Traditional database: stores rows and columns, searches by exact values
Full-text search engine (Elasticsearch, Solr): stores text, searches by keyword frequency
Vector database: stores embeddings, searches by mathematical similarity

When you query a vector database, you do not ask it "find the document where title = X." You give it a query vector and ask it: "find me the K documents whose vectors are most similar to this query vector." This is called k-Nearest Neighbour (k-NN) search or, in its approximate form, Approximate Nearest Neighbour (ANN) search.

The database returns the top-K most similar items, ranked by similarity score. Every result is the embedding model's numerical opinion of "how closely related this content is to your query."

How Vector Similarity is Measured

Three similarity metrics are used in practice:

Cosine Similarity is the most common for text. It measures the angle between two vectors, ignoring their magnitude. Two documents that use the same concepts in different proportions still score highly. Range: -1 to 1. Higher is more similar.

$$\text{cosine_similarity}(A, B) = \frac{A \cdot B}{|A| |B|}}$$

Dot Product is fast and effective when your vectors are already normalised (unit length). Most embedding models produce unit-normalised vectors, making dot product equivalent to cosine similarity but cheaper to compute.

Euclidean Distance (L2) measures the straight-line distance between two points in vector space. Smaller distance means more similar. Used more for image and multimodal embeddings than text.

In practice: for text-based AI applications, cosine similarity or dot product with normalised vectors is the default choice. Your vector database handles the metric selection - you typically just specify it at index creation time.

How Vector Databases Handle Scale

Here is the slow, obvious approach to similarity search: take your query vector, compute the similarity to every single vector in the database, sort the results, return the top K. This is called exact k-NN or brute-force search. It works perfectly for small datasets. At 10 million vectors, it becomes too slow for real-time queries.

Vector databases solve this with Approximate Nearest Neighbour (ANN) indexing - data structures that trade a tiny amount of accuracy for massive speed gains. The two dominant index types are:

HNSW (Hierarchical Navigable Small World) builds a layered graph of vectors. Starting from a sparse top layer and drilling down to a dense bottom layer, the search algorithm navigates the graph in logarithmic time rather than linear time. HNSW has excellent query speed and recall. It is the default in ChromaDB, Weaviate, and Qdrant.

IVF (Inverted File Index) divides vectors into clusters (Voronoi cells). At query time, only the nearest clusters are searched rather than the full dataset. When combined with Product Quantisation (PQ) for compression, IVF-PQ dramatically reduces memory usage. Faiss (Meta's library) popularised this approach; it is the basis for Pinecone's indexes.

The practical takeaway: HNSW is easier to tune and excellent for most use cases. IVF-based approaches shine at very large scale (hundreds of millions of vectors) where memory is the bottleneck.

The Major Vector Databases in 2026

You have several strong options. Here is a practical comparison:

ChromaDB - open source, runs in-process (no separate server needed), stores embeddings locally on disk. Perfect for development, prototypes, and small production deployments. Zero infrastructure overhead. Free.

Pinecone - fully managed cloud service. No infrastructure to maintain. Excellent developer experience and good documentation. Pay-per-use pricing. Best choice when you want to avoid operational complexity at scale.

pgvector - a PostgreSQL extension that adds a vector column type and ANN index. If your application already lives in Postgres, pgvector lets you store vectors alongside your existing relational data in the same database. Excellent for reducing infrastructure footprint.

Weaviate - open source, supports multi-modal data (text, images), has built-in hybrid search (vector + keyword). Good for complex semantic search applications. Docker-deployable.

Qdrant - open source, written in Rust for high performance, excellent filtering capabilities (combine vector similarity with metadata filters in a single query). Growing fast in 2026.

Milvus - open source, designed for billion-scale deployments, distributed architecture. Best suited for very large enterprise deployments.

Which One Should You Start With?

For learning, use ChromaDB - it requires no external services and runs entirely in Python. For a production web app, use pgvector if you are already on Postgres, or Pinecone if you want zero infrastructure overhead. For large-scale self-hosted deployments, use Qdrant or Weaviate.

Vector Databases vs Traditional Databases: Side-by-Side

Feature	SQL Database	Full-Text Search	Vector Database
Query type	Exact match, range	Keyword / BM25	Semantic similarity
Data type	Structured records	Text documents	Embeddings (any modality)
Search concept	WHERE clause	TF-IDF, BM25 scoring	k-NN, cosine similarity
Understands meaning?	No	Partially	Yes
Good for AI/RAG?	No	Limited	Yes
Infrastructure	Postgres, MySQL	Elasticsearch	ChromaDB, Pinecone, Qdrant

Important: vector databases do not replace relational databases. Most real applications use both. Your user account data, order records, and billing information belong in Postgres. Your document embeddings and semantic search capability belong in a vector database.

Real-World Use Cases

Vector databases power a surprising range of AI applications:

Retrieval-Augmented Generation (RAG) - the most common use case in 2026. Embed your documents, store them in a vector database, embed the user's question, retrieve the most relevant document chunks, pass them to an LLM for answer generation. See Build a RAG App with Claude for a full implementation.

Semantic Search - replace keyword search with meaning-based search across documentation, products, or any text corpus. A search for "comfortable summer footwear" returns sandals and canvas shoes even if those words are not in the query.

Recommendation Engines - embed user preferences and item descriptions into the same vector space. Find the K items most similar to a user's history vector. Netflix, Spotify, and most major e-commerce platforms use this approach at scale.

Duplicate Detection & Deduplication - embed content and find near-duplicate items by similarity score. Useful for detecting plagiarism, finding near-identical support tickets, and cleaning datasets.

Anomaly Detection - embed log events, transactions, or sensor readings. Flag items whose vectors fall far from any cluster as potential anomalies.

Long-Term Memory for AI Agents - store conversation summaries and past interactions as embeddings. When a user returns, retrieve the most relevant memories and include them in the agent's context window.

Your First Vector Database: Working Python Example

This example uses ChromaDB - no account, no API key, runs entirely locally.

Install the dependencies:

bash

pip install chromadb sentence-transformers

Create a collection and add documents:

python

import chromadb
from chromadb.utils import embedding_functions

# Initialise ChromaDB with local persistent storage
client = chromadb.PersistentClient(path="./vector_db")

# Use a free, local embedding model from sentence-transformers
embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"   # 80 MB model, runs entirely offline
)

# Create a collection (equivalent to a table in SQL)
collection = client.get_or_create_collection(
    name="support_articles",
    embedding_function=embedding_fn,
    metadata={"hnsw:space": "cosine"}   # use cosine similarity
)

# Add documents - ChromaDB generates the embeddings automatically
documents = [
    "How to reset your password if you are locked out of your account",
    "Device fails to boot following a software patch or firmware update",
    "How to export your billing history and download invoices",
    "Network connection drops intermittently on Windows 11",
    "How to transfer your licence to a new computer",
    "Battery drains faster than normal after the latest update",
]

collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))],
    metadatas=[{"category": "support"} for _ in documents]
)

print(f"Collection contains {collection.count()} documents")

Query with natural language:

python

# Query - no keywords, just a natural language question
results = collection.query(
    query_texts=["my laptop won't turn on after the update"],
    n_results=3
)

print("Top 3 most relevant articles:")
for i, (doc, distance) in enumerate(
    zip(results["documents"][0], results["distances"][0])
):
    similarity = 1 - distance   # cosine distance -> similarity score
    print(f"\n{i+1}. Similarity: {similarity:.3f}")
    print(f"   {doc}")

Expected output:

text

Top 3 most relevant articles:
1. Similarity: 0.847
   Device fails to boot following a software patch or firmware update

2. Similarity: 0.721
   Battery drains faster than normal after the latest update

3. Similarity: 0.634
   Network connection drops intermittently on Windows 11

Notice the top result - "Device fails to boot following a software patch" - contains zero words from the query "my laptop won't turn on after the update." The vector database found it purely through semantic similarity. A LIKE query would have returned nothing useful.

Metadata Filtering: Vectors + Structured Queries

Vector databases are not just about similarity search. Most support combining vector similarity with metadata filters, letting you scope a search to a specific category, date range, user, or any other structured attribute.

python

# Add documents with richer metadata
collection.add(
    documents=[
        "Subscription auto-renews annually on your billing date",
        "Cancel your subscription before the renewal date to avoid charges",
        "How to upgrade from the Free plan to the Pro plan",
    ],
    ids=["doc_6", "doc_7", "doc_8"],
    metadatas=[
        {"category": "billing", "plan": "all"},
        {"category": "billing", "plan": "all"},
        {"category": "billing", "plan": "free"},
    ]
)

# Semantic search scoped to billing category only
results = collection.query(
    query_texts=["will I be charged if I forget to cancel?"],
    n_results=2,
    where={"category": "billing"}   # metadata filter
)

for doc in results["documents"][0]:
    print(doc)

This is something full-text search engines can also do, but vector databases do it in a single indexed query - no post-filtering step required.

Embedding Model Consistency

Every document in your vector database must be embedded with the same model. Your query must also be embedded with the same model. Mixing models (e.g., indexing with OpenAI's text-embedding-3-small but querying with all-MiniLM) produces meaningless results because the vector spaces are completely different.

Updating and Deleting Vectors

A common misconception: vector databases are append-only. They are not. ChromaDB and all major vector databases support full CRUD operations.

python

# Update a document (re-embeds automatically)
collection.update(
    ids=["doc_0"],
    documents=["How to reset your password or unlock your account after too many failed attempts"],
    metadatas=[{"category": "support", "updated": "2026-04"}]
)

# Delete a document
collection.delete(ids=["doc_2"])

# Check count after deletion
print(f"Collection now contains {collection.count()} documents")

In production RAG systems, you will routinely update vectors when source documents change and delete vectors when documents are removed from the source corpus.

When Do You Actually Need a Vector Database?

Not every project needs one. Here is a practical decision framework:

Use a vector database when:

You are building RAG (retrieving relevant context for an LLM)
You need semantic search - finding by meaning rather than keywords
You are building recommendation features based on content similarity
Your dataset has more than a few thousand items and keyword search is missing relevant results

You probably do not need one when:

You are doing exact lookups (user by ID, order by number)
Your dataset is small enough that a simple in-memory similarity scan is fast enough
You are already using Postgres and pgvector serves the same need with less complexity
Full-text search with BM25 is good enough for your use case

For most developers building AI applications in 2026, the answer is: yes, you need one. Semantic search and RAG have become table-stakes features, and both require vector storage.

Key Takeaways

Vectors are lists of numbers produced by ML models that numerically encode the meaning of content
Similar content produces similar vectors - this is the fundamental property that makes semantic search possible
Vector databases store and index embeddings, returning the K most similar items to a query vector in milliseconds
ANN indexes (HNSW, IVF) make similarity search fast at scale by approximating the search space
ChromaDB is the easiest starting point - runs locally, no account needed, full Python API
Vector databases do not replace SQL databases - they store embeddings alongside your structured data in a typical production stack
RAG, semantic search, recommendations, memory, and anomaly detection all depend on vector search

What's Next?

Now that you understand what a vector database is and how to use ChromaDB for basic similarity search, the next steps are:

Build a full RAG pipeline with document chunking, embedding, and grounded answer generation using Claude: Project: Build a RAG App with Claude
Compare your options: ChromaDB vs Pinecone vs pgvector - a full decision guide is coming in this series
Build a semantic search engine from scratch - the next post in this series

This post is part of the Vector Database Series - a deep-dive into the data layer that powers modern AI applications. The official documentation for the vector databases covered in the examples is available at ChromaDB docs, Pinecone docs, and pgvector on GitHub. For a deeper comparison of the options, see ChromaDB vs Pinecone vs pgvector.

Continue reading:

What is a Vector Database? ← you are here
ChromaDB Tutorial: The Complete Beginner's Guide
ChromaDB vs Pinecone vs pgvector: Which Should You Use?
Build a Semantic Search Engine from Scratch
LLM Engineering Course: Vector Databases with Pinecone & ChromaDB — structured lesson with side-by-side code examples
LLM Engineering Course: Building a RAG Pipeline — wire up embeddings, retrieval, and generation end-to-end
Vector Database Optimisation for Production

MongoDB as a vector backend: pgvector runs inside PostgreSQL, but if your stack uses MongoDB, the MongoDB & NoSQL Mastery course is the structured path to getting comfortable with document databases — start with What Is NoSQL? for the model comparison, then MongoDB Replication Guide for high-availability patterns relevant to production AI systems.

External references: