Artificial IntelligenceDatabasesMachine Learning

What is a Vector Database? The Complete Beginner's Guide (2026)

TT
TopicTrick
What is a Vector Database? The Complete Beginner's Guide (2026)

You have spent years learning SQL. You know how to write a WHERE name = 'invoice' query. You understand indexes, joins, and foreign keys. And now every AI tutorial is telling you to throw all of that away and use something called a vector database.

What even is a vector database? Why does RAG need one? Why can't you just store your documents in Postgres and search with LIKE '%invoice%'?

This guide answers all of that. By the end, you will understand what vectors are, what vector databases do differently to every other database you have used, why AI applications depend on them, and how to run your first similarity search in Python.


The Problem with Traditional Databases

Before explaining what a vector database is, it helps to understand what it solves.

Imagine you have a knowledge base of 50,000 support articles. A user types: "my laptop won't turn on after the update."

A traditional SQL query would search for articles containing those exact words. If an article says "device fails to boot following a software patch" — it is the most relevant article in your database. But SQL LIKE search will miss it entirely because not a single word matches.

This is the exact-match problem. Traditional databases store and search structured data: names, dates, IDs, numbers. They are extraordinarily good at finding "all orders placed by customer 4821 after 2025-01-01." They are terrible at finding "documents that mean roughly the same thing as this sentence."

Meaning and intent do not live in keywords. They live in vectors.


What is a Vector (and What is a Vector Embedding)?

A vector in mathematics is just a list of numbers. A 3-dimensional vector might look like [0.2, -0.8, 0.5]. A 1,536-dimensional vector (which is what OpenAI's embedding models produce) looks like [0.012, -0.743, 0.221, 0.008, ... 1,536 numbers total].

A vector embedding is a vector that has been produced by a machine learning model from some piece of content — a sentence, a paragraph, an image, an audio clip, a product listing. The critical property that makes this useful is:

Content that is semantically similar will produce vectors that are mathematically close to each other.

"My laptop won't turn on" and "device fails to boot" — two completely different strings — will produce vectors that are very close in 1,536-dimensional space. "My dog loves playing fetch" will produce a vector that is far away from both.

The embedding model has learned the meaning of language (or images, or audio) and compressed it into a dense numerical representation. This is how you turn the fuzzy concept of "meaning" into something a computer can measure precisely.


What is a Vector Database?

A vector database is a database built specifically to store, index, and query vector embeddings at scale.

You can think of it like this:

  • Traditional database: stores rows and columns, searches by exact values
  • Full-text search engine (Elasticsearch, Solr): stores text, searches by keyword frequency
  • Vector database: stores embeddings, searches by mathematical similarity

When you query a vector database, you do not ask it "find the document where title = X." You give it a query vector and ask it: "find me the K documents whose vectors are most similar to this query vector." This is called k-Nearest Neighbour (k-NN) search or, in its approximate form, Approximate Nearest Neighbour (ANN) search.

The database returns the top-K most similar items, ranked by similarity score. Every result is the embedding model's numerical opinion of "how closely related this content is to your query."


How Vector Similarity is Measured

Three similarity metrics are used in practice:

Cosine Similarity is the most common for text. It measures the angle between two vectors, ignoring their magnitude. Two documents that use the same concepts in different proportions still score highly. Range: -1 to 1. Higher is more similar.

$$\text{cosine_similarity}(A, B) = \frac{A \cdot B}{|A| |B|}}$$

Dot Product is fast and effective when your vectors are already normalised (unit length). Most embedding models produce unit-normalised vectors, making dot product equivalent to cosine similarity but cheaper to compute.

Euclidean Distance (L2) measures the straight-line distance between two points in vector space. Smaller distance means more similar. Used more for image and multimodal embeddings than text.

In practice: for text-based AI applications, cosine similarity or dot product with normalised vectors is the default choice. Your vector database handles the metric selection — you typically just specify it at index creation time.


How Vector Databases Handle Scale

Here is the slow, obvious approach to similarity search: take your query vector, compute the similarity to every single vector in the database, sort the results, return the top K. This is called exact k-NN or brute-force search. It works perfectly for small datasets. At 10 million vectors, it becomes too slow for real-time queries.

Vector databases solve this with Approximate Nearest Neighbour (ANN) indexing — data structures that trade a tiny amount of accuracy for massive speed gains. The two dominant index types are:

HNSW (Hierarchical Navigable Small World) builds a layered graph of vectors. Starting from a sparse top layer and drilling down to a dense bottom layer, the search algorithm navigates the graph in logarithmic time rather than linear time. HNSW has excellent query speed and recall. It is the default in ChromaDB, Weaviate, and Qdrant.

IVF (Inverted File Index) divides vectors into clusters (Voronoi cells). At query time, only the nearest clusters are searched rather than the full dataset. When combined with Product Quantisation (PQ) for compression, IVF-PQ dramatically reduces memory usage. Faiss (Meta's library) popularised this approach; it is the basis for Pinecone's indexes.

The practical takeaway: HNSW is easier to tune and excellent for most use cases. IVF-based approaches shine at very large scale (hundreds of millions of vectors) where memory is the bottleneck.


The Major Vector Databases in 2026

You have several strong options. Here is a practical comparison:

ChromaDB — open source, runs in-process (no separate server needed), stores embeddings locally on disk. Perfect for development, prototypes, and small production deployments. Zero infrastructure overhead. Free.

Pinecone — fully managed cloud service. No infrastructure to maintain. Excellent developer experience and good documentation. Pay-per-use pricing. Best choice when you want to avoid operational complexity at scale.

pgvector — a PostgreSQL extension that adds a vector column type and ANN index. If your application already lives in Postgres, pgvector lets you store vectors alongside your existing relational data in the same database. Excellent for reducing infrastructure footprint.

Weaviate — open source, supports multi-modal data (text, images), has built-in hybrid search (vector + keyword). Good for complex semantic search applications. Docker-deployable.

Qdrant — open source, written in Rust for high performance, excellent filtering capabilities (combine vector similarity with metadata filters in a single query). Growing fast in 2026.

Milvus — open source, designed for billion-scale deployments, distributed architecture. Best suited for very large enterprise deployments.

Which One Should You Start With?

For learning, use ChromaDB — it requires no external services and runs entirely in Python. For a production web app, use pgvector if you are already on Postgres, or Pinecone if you want zero infrastructure overhead. For large-scale self-hosted deployments, use Qdrant or Weaviate.


    Vector Databases vs Traditional Databases: Side-by-Side

    FeatureSQL DatabaseFull-Text SearchVector Database
    Query typeExact match, rangeKeyword / BM25Semantic similarity
    Data typeStructured recordsText documentsEmbeddings (any modality)
    Search conceptWHERE clauseTF-IDF, BM25 scoringk-NN, cosine similarity
    Understands meaning?NoPartiallyYes
    Good for AI/RAG?NoLimitedYes
    InfrastructurePostgres, MySQLElasticsearchChromaDB, Pinecone, Qdrant

    Important: vector databases do not replace relational databases. Most real applications use both. Your user account data, order records, and billing information belong in Postgres. Your document embeddings and semantic search capability belong in a vector database.


    Real-World Use Cases

    Vector databases power a surprising range of AI applications:

    Retrieval-Augmented Generation (RAG) — the most common use case in 2026. Embed your documents, store them in a vector database, embed the user's question, retrieve the most relevant document chunks, pass them to an LLM for answer generation. See Build a RAG App with Claude for a full implementation.

    Semantic Search — replace keyword search with meaning-based search across documentation, products, or any text corpus. A search for "comfortable summer footwear" returns sandals and canvas shoes even if those words are not in the query.

    Recommendation Engines — embed user preferences and item descriptions into the same vector space. Find the K items most similar to a user's history vector. Netflix, Spotify, and most major e-commerce platforms use this approach at scale.

    Duplicate Detection & Deduplication — embed content and find near-duplicate items by similarity score. Useful for detecting plagiarism, finding near-identical support tickets, and cleaning datasets.

    Anomaly Detection — embed log events, transactions, or sensor readings. Flag items whose vectors fall far from any cluster as potential anomalies.

    Long-Term Memory for AI Agents — store conversation summaries and past interactions as embeddings. When a user returns, retrieve the most relevant memories and include them in the agent's context window.


    Your First Vector Database: Working Python Example

    This example uses ChromaDB — no account, no API key, runs entirely locally.

    Install the dependencies:

    bash
    1pip install chromadb sentence-transformers

    Create a collection and add documents:

    python
    1import chromadb 2from chromadb.utils import embedding_functions 3 4# Initialise ChromaDB with local persistent storage 5client = chromadb.PersistentClient(path="./vector_db") 6 7# Use a free, local embedding model from sentence-transformers 8embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction( 9 model_name="all-MiniLM-L6-v2" # 80 MB model, runs entirely offline 10) 11 12# Create a collection (equivalent to a table in SQL) 13collection = client.get_or_create_collection( 14 name="support_articles", 15 embedding_function=embedding_fn, 16 metadata={"hnsw:space": "cosine"} # use cosine similarity 17) 18 19# Add documents — ChromaDB generates the embeddings automatically 20documents = [ 21 "How to reset your password if you are locked out of your account", 22 "Device fails to boot following a software patch or firmware update", 23 "How to export your billing history and download invoices", 24 "Network connection drops intermittently on Windows 11", 25 "How to transfer your licence to a new computer", 26 "Battery drains faster than normal after the latest update", 27] 28 29collection.add( 30 documents=documents, 31 ids=[f"doc_{i}" for i in range(len(documents))], 32 metadatas=[{"category": "support"} for _ in documents] 33) 34 35print(f"Collection contains {collection.count()} documents")

    Query with natural language:

    python
    1# Query — no keywords, just a natural language question 2results = collection.query( 3 query_texts=["my laptop won't turn on after the update"], 4 n_results=3 5) 6 7print("Top 3 most relevant articles:") 8for i, (doc, distance) in enumerate( 9 zip(results["documents"][0], results["distances"][0]) 10): 11 similarity = 1 - distance # cosine distance → similarity score 12 print(f"\n{i+1}. Similarity: {similarity:.3f}") 13 print(f" {doc}")

    Expected output:

    Top 3 most relevant articles: 1. Similarity: 0.847 Device fails to boot following a software patch or firmware update 2. Similarity: 0.721 Battery drains faster than normal after the latest update 3. Similarity: 0.634 Network connection drops intermittently on Windows 11

    Notice the top result — "Device fails to boot following a software patch" — contains zero words from the query "my laptop won't turn on after the update." The vector database found it purely through semantic similarity. A LIKE query would have returned nothing useful.


    Metadata Filtering: Vectors + Structured Queries

    Vector databases are not just about similarity search. Most support combining vector similarity with metadata filters, letting you scope a search to a specific category, date range, user, or any other structured attribute.

    python
    1# Add documents with richer metadata 2collection.add( 3 documents=[ 4 "Subscription auto-renews annually on your billing date", 5 "Cancel your subscription before the renewal date to avoid charges", 6 "How to upgrade from the Free plan to the Pro plan", 7 ], 8 ids=["doc_6", "doc_7", "doc_8"], 9 metadatas=[ 10 {"category": "billing", "plan": "all"}, 11 {"category": "billing", "plan": "all"}, 12 {"category": "billing", "plan": "free"}, 13 ] 14) 15 16# Semantic search scoped to billing category only 17results = collection.query( 18 query_texts=["will I be charged if I forget to cancel?"], 19 n_results=2, 20 where={"category": "billing"} # metadata filter 21) 22 23for doc in results["documents"][0]: 24 print(doc)

    This is something full-text search engines can also do, but vector databases do it in a single indexed query — no post-filtering step required.

    Embedding Model Consistency

    Every document in your vector database must be embedded with the same model. Your query must also be embedded with the same model. Mixing models (e.g., indexing with OpenAI's text-embedding-3-small but querying with all-MiniLM) produces meaningless results because the vector spaces are completely different.


      Updating and Deleting Vectors

      A common misconception: vector databases are append-only. They are not. ChromaDB and all major vector databases support full CRUD operations.

      python
      1# Update a document (re-embeds automatically) 2collection.update( 3 ids=["doc_0"], 4 documents=["How to reset your password or unlock your account after too many failed attempts"], 5 metadatas=[{"category": "support", "updated": "2026-04"}] 6) 7 8# Delete a document 9collection.delete(ids=["doc_2"]) 10 11# Check count after deletion 12print(f"Collection now contains {collection.count()} documents")

      In production RAG systems, you will routinely update vectors when source documents change and delete vectors when documents are removed from the source corpus.


      When Do You Actually Need a Vector Database?

      Not every project needs one. Here is a practical decision framework:

      Use a vector database when:

      • You are building RAG (retrieving relevant context for an LLM)
      • You need semantic search — finding by meaning rather than keywords
      • You are building recommendation features based on content similarity
      • Your dataset has more than a few thousand items and keyword search is missing relevant results

      You probably do not need one when:

      • You are doing exact lookups (user by ID, order by number)
      • Your dataset is small enough that a simple in-memory similarity scan is fast enough
      • You are already using Postgres and pgvector serves the same need with less complexity
      • Full-text search with BM25 is good enough for your use case

      For most developers building AI applications in 2026, the answer is: yes, you need one. Semantic search and RAG have become table-stakes features, and both require vector storage.


      Key Takeaways

      • Vectors are lists of numbers produced by ML models that numerically encode the meaning of content
      • Similar content produces similar vectors — this is the fundamental property that makes semantic search possible
      • Vector databases store and index embeddings, returning the K most similar items to a query vector in milliseconds
      • ANN indexes (HNSW, IVF) make similarity search fast at scale by approximating the search space
      • ChromaDB is the easiest starting point — runs locally, no account needed, full Python API
      • Vector databases do not replace SQL databases — they store embeddings alongside your structured data in a typical production stack
      • RAG, semantic search, recommendations, memory, and anomaly detection all depend on vector search

      What's Next?

      Now that you understand what a vector database is and how to use ChromaDB for basic similarity search, the next steps are:

      • Build a full RAG pipeline with document chunking, embedding, and grounded answer generation using Claude: Project: Build a RAG App with Claude
      • Compare your options: ChromaDB vs Pinecone vs pgvector — a full decision guide is coming in this series
      • Build a semantic search engine from scratch — the next post in this series

      This post is part of the Vector Database Series — a deep-dive into the data layer that powers modern AI applications.

      Continue reading:

      1. What is a Vector Database? ← you are here
      2. ChromaDB Tutorial: The Complete Beginner's Guide
      3. ChromaDB vs Pinecone vs pgvector: Which Should You Use?
      4. Build a Semantic Search Engine from Scratch
      5. Vector Database Optimisation for Production