Vector search beyond the database
Photo by Catherine Kay Greenup on Unsplash
The purpose of vectors
Why take the trouble and cost of incorporating vector embedding retrieval into an application? The right answer is: Because it can improve retrieval quality - recall and relevance in information retrieval terminology.
Vectors can help with this because they introduce semantics into an application: Rather than just matching by words and exact fields, vectors allow fuzzily matching the meaning of a query to the meaning of your content.
Simple vector search works badly
While this sounds great, vectors have limitations. The fuzzy matching nature of vector embeddings make them unsuitable to cases where precision is important - such as finding a particular product, detailed address, or warehouse shelf.
When applied to textual data, the academic community has shown that even simple bm25 text search outperforms vector similarity, and the industry has converged on combining both text and vector search, both in finding candidates and in scoring them (relevance).
And in all applications, methods that use many more detailed vectors along with tensor math to retrieve them produce vastly superior results to simply representing each content item by a single vector.
When you are working with vectors you are doing search
Using vectors effectively requires thinking in terms of information retrieval: A variety of conditions generate candidates, and those are scored (usually in multiple phases to save on compute) to produce the most relevant, which are then returned to the user or LLM. This process is very different from a lookup in a database, and is inherent in vector search since all vectors are matching every query, just with different proximity scores.
Since using just vector proximity does not provide good results, additional factors must be leveraged, such as multiple detailed vectors, lexical matching, structured fields and so on, and these signals must be combined into a final score using tensor math and small machine-learned models. Scaling this process cost effectively to large amounts of data and high query rates leads to a different architecture than a database.
This is why search engines and databases are separate product categories even though text (and now vectors) can be stored in databases: It is not about the storage, it is about the ranking.
Relevance is crucial for RAG applications
It is well understood that relevance is crucial in information retrieval applications for humans, whether for actual end user text search, or in implicit searches such as for product recommendations. What is less widely appreciated is that relevance is much more important when retrieving data for an LLM. This is because contrary to humans, LLMs are not online learners: They do not constantly pick up useful information from going to meetings, reading their mail and so on. After training they learn nothing and so are completely dependent on getting all the information they need delivered to them by information retrieval at the point in time when they need to do some work.
No amount of model intelligence can compensate for missing the information needed to perform a task, and therefore most of the work in creating RAG applications that reliably delivers quality are in search relevance: More precise modeling of the data, leveraging and updating more signals and vectors, applying ever better machine-learned models over larger amounts of the candidates and all the other tasks familiar from information retrieval.
The tradeoffs between using a database and a search engine
For organizations that already have much of their data in a database, it can be attractive to simply leverage the vector support added to that database rather than introducing new technology for vector use cases. This makes for reuse existing integrations, knowledge, operational practices and vendor relationships.
However, if the purpose of using vectors is to improve quality or unlock new use cases, it is worth considering that vector retrieval inherently means moving into the category of search, and that achieving quality goals with vectors takes a lot more than simple retrieval to a single vector: Most of the work and features needed lie in relevance, and these cannot be architecturally separated from the database since this means sending too much data over the network, which is why search engines both store data and perform the relevance functions.
A fruitful resolution of this tradeoff may be to separate into two cases: Where vectors are applied without a need for achieving particular quality levels, the existing database can be applied to store them and do simple lookups by proximity, while where quality matters, a proper vector-enabled search engine which supports the necessary relevance work can be leveraged.