How do vector databases work?

What is a vector database?

A vector database is a database system specialised in storing and searching vectors — long rows of numbers that often run to hundreds or thousands of values. Each of these vectors stands for a piece of content: a sentence, a document, an image, an audio file. Instead of searching the content itself, the database searches its numerical representation.

The reason for this is the kind of question such systems are meant to answer. A classic database is good at finding exact matches: all customers with postcode 10115, all orders placed on Tuesday. A vector database, by contrast, is built to find similar things — texts that mean the same thing even when not a single word overlaps. It searches for meaning, not for letters.

See it as a network. Each stored item is an entity, and its position in the space records which other entities it stands in close relation to. A query is itself just another point in this space — and the database returns what lies nearest, that is, what it has the strongest relations to.

What is an embedding — meaning as a position in space?

The key to all of this is the embedding. An embedding is a technique from language processing and machine learning that translates a piece of content into a vector: a model reads a text and outputs a fixed row of numbers that encodes its meaning. “Dog” and “puppy” end up with very similar vectors, “dog” and “stock price” with very different ones.

Meaning, then, turns into nearness. Each value in the vector describes a direction in the space, and the overall position sums up what the content responds to, what it connects with, what it sets itself apart from. At its core this is a relation cast into numbers: two items lie close to each other because the model learned many shared references between them.

This is exactly why an embedding is more than a keyword index. It carries not only which words appear, but in what relation the contents stand to one another. The database inherits these relations: to search the space is to search an already-learned web of nearness and distance.

How does similarity search work?

A search begins by turning your query into a vector too. You type a question, the same embedding model converts it into a point, and the database now looks for the stored points that lie closest to it. “Closest” is measured — usually as the angle between the vectors (cosine similarity) or as a geometric distance. This task is called nearest neighbor search.

In the model's terms, the query activates the nearest relations. Your question-point, so to speak, sends a signal into the space, and it first reaches the contents to which the shortest connections exist. What comes back is not the one exactly matching entry but the neighbourhood most similar to your question — the relations that are becoming active right now.

With millions or billions of vectors, comparing each one would be too slow. So vector databases use approximate methods (approximate nearest neighbor) that arrange the space in advance into a structure of nearness relations — for instance a graph of closely neighbouring points. The search then hops from neighbour to neighbour instead of checking everything. It trades a tiny bit of accuracy for a great deal of speed.

What do vector databases have to do with AI, RAG and LLMs?

Vector databases became popular because they solve a problem of large language models (LLMs). An LLM only knows what was in its training data, and it can invent things that sound plausible but are wrong. It has no reliable memory for your specific documents. This is exactly where the database steps in: it feeds the model the right content at the right moment.

This interplay is called retrieval-augmented generation, or RAG for short. Your documents are translated into vectors and stored. When you ask a question, the vector database finds the most similar pieces of text, and these are handed to the LLM together with your question. The model then answers not from memory but on the basis of the just-found, citable passages.

Thought of as a network, RAG extends the model's active relations by exactly those your question needs right now. The LLM stays the language-fluent node, the vector database is its connection to current, checkable knowledge. Together they are stronger than either alone: language from the one, grounded content from the other.

How does this differ from a classic database and a knowledge graph?

A classic, relational database works with exact values in tables and fixed columns. It is unbeatable when the question is precise and has one clear answer. But it understands no meaning: search for a “cheap laptop” and you won't find an entry that only says “affordable notebook”. The word has to match, not the sense.

A knowledge graph stores knowledge as explicit, named relationships: “Berlin is the capital of Germany”, “insulin lowers blood sugar”. Here the relations are clearly defined and traceable — you can follow them like routes on a map. The price is that someone has to create or derive these relations; the graph only knows what has been explicitly connected.

The vector database sits in between. Its relations are not named but learned: nearness in space means “means something similar”, without anyone ever having written that down. This makes it both flexible and fuzzy. In practice the three often complement one another — exact filters from the relational world, explicit facts from the graph, soft nearness of meaning from the vector space.

Where are the limits of vector databases?

A vector database is only as good as the embeddings it stores. If the model translates content into vectors poorly, dissimilar things end up close together and similar ones far apart — the search then returns plausible-looking but wrong neighbours. Switch the embedding model, and you usually have to re-translate the entire stock, because old and new vectors don't live in the same space.

Nearness, moreover, is not the same as truth. The database finds what is similar, not what is correct. It can place two contradicting texts equally close to your question and return both. And because most systems search approximately, the truly best hit can occasionally slip through. For hard, exact conditions — a specific date, a precise number — classic filtering is still needed.

In the model's terms: an active relation is not yet a right relation. Nearness shows only that a connection exists, not whether it holds. A vector database is therefore a strong tool for finding relevant connections — but no proof that what it found is true. The checking stays your job.

Seen through the model

Imagine you're building an assistant for your company's manuals. Someone types: “How do I swap the battery in the outdoor unit?” But nowhere in the documents does that exact sentence appear — there it reads “battery replacement on the exterior module”. A keyword search would fail, because almost no word overlaps. The meaning is the same, the letters are not.

See it as a space of points. Each manual section is stored as a vector, that is, as a position that holds its meaning. Your question is turned into a point by the same model. Because “swap the battery” and “battery replacement” mean almost the same thing to the model, the two points lie close together — the relation between question and correct section is short, even without shared words.

The search now activates the nearest relations: it pulls the three or four sections lying closest to your question-point and hands them to a language model. That model phrases an answer from them — grounded in exactly the passages it just found. You change neither the question nor the documents; you use the fact that meaning is stored in space as nearness, and you let the fitting relation become active.

Frequently asked

What is the difference between a vector database and a normal database?

A normal, relational database stores exact values in tables and searches for precise matches — a customer number, a date. A vector database instead stores embeddings, rows of numbers that capture the meaning of content, and searches for similarity rather than exact hits. This lets it find texts that mean the same thing even when not a single word overlaps. The two types are not mutually exclusive: in practice you often combine exact filters from a classic database with meaning-based search from the vector space.

What is an embedding explained simply?

An embedding is the translation of a piece of content into a long row of numbers — a vector — that encodes its meaning as a position in a space. A trained model reads a text and outputs these numbers. Content with similar meaning receives similar vectors and ends up close together in the space; very different content ends up far apart. Meaning becomes a measurable nearness. That nearness is precisely what a vector database uses to find similar things without requiring any words to match.

What are vector databases needed for in AI?

They give large language models a searchable memory for specific content. A language model knows only its training data and can invent things that are wrong. By storing documents as vectors, the system can retrieve the most similar pieces of text for any question and pass them to the model. This approach is called retrieval-augmented generation (RAG). The model then answers not from memory but on the basis of citable passages it has just found — making responses more current and traceable. Semantic search, recommendation systems, and image search also rely on vector databases.

What is similarity search or nearest neighbor search?

Similarity search means finding the points in a space that lie closest to a given query point — formally, nearest neighbor search. In a vector database your query is converted into a vector, and the system looks for the stored vectors with the smallest distance or angle to it. With very large collections, comparing every single vector would be too slow, so approximate methods (approximate nearest neighbor) are used that organize the space into a structure of nearness relations in advance. This trades a little accuracy for a lot of speed.

What is the difference between a vector database and a knowledge graph?

A knowledge graph stores knowledge as explicitly named relationships — for instance "Berlin is the capital of Germany" — where the relations are clearly defined and traceable, but someone has to create them. A vector database stores content as vectors whose nearness expresses learned, unnamed similarity: two points lie close together because a model learned many shared references between them without anyone ever writing those down. The graph is precise and verifiable; the vector space is flexible and fuzzy. The two are often combined to bring exact facts and soft semantic nearness together.

What are the limits of vector databases?

A vector database is only as good as its embeddings: if the model translates content into vectors poorly, dissimilar things end up close together and the search returns plausible-looking but wrong results. Nearness is also not the same as truth — the system finds what is similar, not what is correct, and can surface contradictory texts alike. Because most searches are approximate, the best match can occasionally slip through, and for exact conditions like a specific date, classic filtering is still necessary. A vector database finds relevant connections, but does not prove that what it found is true.

Keep thinking

Last updated: 2026-07-01