Will new vector databases dislodge traditional search engines?
Doug Turnbull asks an interesting question on Linkedin; Will new vector databases dislodge traditional search engines?
The short answer is no, but it depends on what you classify as a traditional search engine.
Features like phrase search, exact search, BM25 ranking, dynamic summaries, and result facets are features we take for granted in a search engine implementation. Most vector databases lack these features. Apache Lucene and Vespa have 20 years of development, adding search critical features. Accelerated dynamic pruning algorithms like wand and BM-wand over inverted indexes also come to mind.
Major web search engines use semantic vector search for candidate retrieval but still allow users to perform an exact phrase search. For example, a user searching for an article number, a phone number, or an ISSN are examples of search use cases that dense vector similarity computations cannot solve.
Real-world search ranking implementations include real-time signals like item stock availability, popularity, or other ranking business constraints. Unfortunately, these signals are hard to compress into a simple vector similarity calculation.
The future of search is hybrid
A successful search implementation uses hybrid retrieval techniques, combining the best of both types of representations; sparse and dense vectors. The hybrid model is demonstrably better than the sum of its parts, especially when applied to new domains without lots of interaction data to train vector embedding models that map data to vectors.
The critical observation is that search implementations must support exact matches and richer ranking than vector similarity alone. Given this, I believe integrating excellent dense vector search capabilities into feature-rich search engine technologies is the right direction.
However, not all search engines are alike.
Not all search engine architectures can add efficient dense vector search capabilities without significantly impacting latency, storage, and serving costs.
For example, search engines built on the Apache Lucene library face severe challenges when exposing the recently added approximate nearest neighbor search support using HNSW graphs. Apache Lucene achieves near real-time indexing by creating multiple immutable segments. One new segment per refresh interval. How many are active in total depends on the number of shards, indexing rate, refresh interval settings, and segment merge policies.
A vector search in Elasticsearch, Apache Solr, or OpenSearch, using Lucene, needs to scan all these active segments per shard, causing unpredictable latency and recall. Furthermore, the query cost, driven by vector distance calculations, increases almost linearly with the number of active segments since there is one graph per Lucene index segment.
But, can immutable segments with HNSW graphs be efficiently merged into fewer and larger segments to solve this search scalability problem?
Unfortunately not, HNSW graph data structures are inherently different from the classic inverted index data structures. Apache Lucene based its immutable segment architecture on sorted inverted index structures in 1998. Due to the sorted property, merging sorted posting lists was simple and cost-efficient. HNSW graphs for high-recall vector search, on the other hand, are immensely expensive to merge, effectively costing the same amount of compute as building a single HNSW graph from scratch.
So what is the alternative if you want all the features of a traditional search engine but also want to incorporate dense vector search in your search use cases?
Luckily, Vespa, the open-source big data serving engine, is an alternative to Apache Lucene-based engines. Vespa implements a mutable HNSW graph per node adjacent to other mutable data structures for efficient retrieval and ranking.
Users of Vespa can effortlessly combine vector search with traditional search engine query operators, implementing hybrid search at scale.
Vespa’s core data structures are mutable, avoiding expensive merging and duplication. Vespa indexing latency is in milliseconds, not seconds. Updating a single field does not require a full re-indexing as in engines built on immutable data structures. For example, updating the popularity does not require reindexing the vector data like in engines built on Apache Lucene.
Critical features such as phrase search, exact search, BM25, proximity features, and result grouping comes for free. In addition to performance, scalability, and reliability, Vespa has proven ranking results on the world’s most extensive open relevancy dataset. Not anecdotes in a sales presentation, but proven ranking results, available to anyone to reproduce.
If you are interested in learning more about Vespa and how organizations like Spotify are using Vespa to unlock the full potential of hybrid search and neural ranking, check out the Vespa Blog or get started with one of many Vespa sample applications. All the sample applications can either be deployed on-premise on your infrastructure or using Vespa Cloud.