Vespa Newsletter, April 2024

In the previous update, we mentioned the YQL IN operator, fuzzy and regexp matching in streaming search, Match-features, and parameter substitution in the embed function. Today, we’re excited to share the following updates:

Vespa SPLADE Embedder

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where queries and documents are represented by term impact scores derived from large language models. We recommend reading Joel Mackenzie et al.’s paper Exploring the Representation Power of SPLADE Models - example from the paper:

Splade

Since Vespa 8.321, the new SPLADE embedder supports SPLADE models. It maps text to a single-dimensional mapped tensor with the subword token as the cell label and the “impact” weight as the cell value. The tensor can then be used in ranking. Read more.

ONNX models with float16

Since Vespa 8.325, you can use half-precision floating-point (float16) ONNX models with the Vespa hugging-face-embedder and colbert-embedder:

<component id="mixbread" type="hugging-face-embedder">
    <transformer-model url="https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1/resolve/main/onnx/model_fp16.onnx"/>
    <tokenizer-model url="https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1/raw/main/tokenizer.json"/>
    <pooling-strategy>cls</pooling-strategy>
</component>

This increases inference performance by 3x compared to float32 when using a GPU.

New guides for using Cohere embedding models

Cohere recently released a new embedding API, now featuring support for binary and int8 vectors - this is significant and enables cost savings and performance improvements - read more in Scaling vector search using Cohere binary embeddings and Vespa.

We have built three comprehensive guides on using the new Cohere embedding models with Vespa:

embed-english-v3.0 with compact binary representation: cohere-binary-vectors-in-vespa-cloud.html
embed-english-v3.0 with two vector representations: billion-scale-vector-search-with-cohere-embeddings-cloud.html
embed-multilingual-v3 - multilingual hybrid search: multilingual-multi-vector-reps-with-cohere-cloud.html

Long-Context ColBERT

Since Vespa 8.299, the colbert-embedder accepts an array of strings in addition to the single string field previously supported. Details are in #30071. This makes it easier to build multi-paragraph ColBERT versions by inputting the paragraph as array elements - see the blog post for practical examples and more information.

New posts from our blog

You may have missed some of these new posts since the last newsletter:

Other companies blogging about how and why they build on Vespa

From the marqo blog by Farshid Zavareh: Marqo V2: Performance at Scale, Predictability, and Control. This article lets you dive into the second version of this open source platform and discover how it addresses the limitations of its predecessor, leveraging Marqo’s inference engine in concert with Vespa for unparalleled performance in large-scale, high-throughput semantic search applications.
From FARFETCH Tech by Ricardo Rossi Tegão: Scaling Recommenders systems with Vespa. Our team successfully implemented the entire recommendation process of one algorithm with Vespa, matching the latency requirements (provide recommendations under 100ms) and scalability needs.
From the Stanby Tech Blog by Satoshi Takatori: Migrating to the Vespa Search Engine. Migrating Elastic and Solr solutions to Vespa.

Upcoming meetups and conferences

AICamp Berlin, Paris, London: Improving the Usefulness of LLMs with RAG
SW2 Conference Denver: Building Something Real with Retrieval Augmented Generation (RAG)
Infoshare 24 Gdansk: Building Something Real with Retrieval Augmented Generation (RAG)

Thanks for reading! Try out Vespa by deploying an application for free to Vespa Cloud.

newsletter

« Farfetch: Scaling recommendations with Vespa Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa »

Vespa Blog