Vespa Newsletter, July 2023

In the previous update, we mentioned multi-vector HNSW Indexing, global-phase re-ranking, LangChain support, improved bfloat16 throughput, and new document feed/export features in the Vespa CLI. Today, we’re excited to share Vector Streaming Search, multiple new embedding features, MIPS support, and performance optimizations:

Vector Streaming Search

When searching personal data or other data sets which are divided into many subsets you never search across, maintaining global indexes is unnecessarily expensive. Vespa streaming search is built for these use cases, and now supports vectors in searching and ranking.

This enables vector search in personal search use cases such as personal assistants at typically less than 5% of the usual cost, while delivering complete rather than approximate results, something which is often crucial with personal data. Read more in our announcement blog post.

Use Embedder Models from Huggingface

Vespa now comes with generic support for embedding models hosted on Huggingface. With the new Huggingface Embedder functionality, developers can export embedding models from Huggingface and import them in ONNX format in Vespa for accelerated inference close to where the data is created. The Huggingface Embedder supports multilingual embedding models as well as multi-vector representations - read more.

GPU Acceleration of Embedding Models

GPU acceleration of embedding model inferences is now supported, unlocking larger and more powerful embedding models while maintaining low serving latency. With this, Vespa embedders can efficiently process large amounts of text data, resulting in faster response times, improved scalability, and lower cost.

Embedding GPU acceleration is available both on Vespa Cloud and for Open Source Vespa use - read more.

More models for Vespa Cloud users

As more teams use embeddings to improve search and recommendation use cases, easy access to models is key for productivity. From the paper:

E5 is a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings.

Vespa Cloud users can find a set of E5 models on the cloud.vespa.ai model hub.

Dotproduct distance metric for ANN

The Maximum Inner Product Search (MIPS) problem arises naturally in recommender systems, where item recommendations and user preferences are modeled with vectors, and the scoring is just the dot product (inner product) between the item vector and the query vector.

Vespa supports a range of distance metrics for approximate nearest neighbor search. Since 8.172, Vespa supports a dotproduct distance metric, used for distance calculations and an extension to HNSW index structures. Read more about how using an extra dimension to map points on a 3D hemisphere makes the vector have the same magnitude and hence solvable as a nearest neighbor problem in the blog post.

Optimizations and features

Query using emojis! The Unicode Characters of Category “Other Symbol” contains emojis, math symbols, etc. From Vespa 8.172 these are indexed as letter characters to support searching for them. E.g., you can now try vespa query ‘select * from music where song contains “🍉“‘.
Sorting on multivalue fields like array or weightedset is now supported: Ascending sort order uses the lowest value while descending sort order uses the highest value. E.g., descending order sort on an array field with [“apple”, “banana”, “melon”] will use “melon” as the sort value - see the reference documentation.
Since Vespa 8.185, you can balance feed vs query resource usage using feeding niceness - use this configuration to de-prioritize feeding.
Since Vespa 8.178, users can use conditional puts with auto-create - read more.
With lidspace max-bloat-factor you can fine tune this compaction job in the content node - since Vespa 8.171.
Vespa supports multivalue attributes, like arrays and maps. In Vespa 8.181 the static memory usage of multivalue attributes is reduced by up to 40%. This is useful for applications with many such fields, with little data each - see #26640 for details.

Thanks for reading! Try out Vespa on Vespa Cloud or grab the latest release at vespa.ai/releases and run it yourself! 😀

newsletter

« Leveraging frozen embeddings in Vespa with SentenceTransformers Simplify Search with Multilingual Embedding Models »

Vespa Blog