Vespa Newsletter, April 2025

In the previous update, we mentioned Python query API, Vespa Logstash Connectors, ModernBERT models, and Vespa CLI multi-get.

Today, we’re excited to share the following updates:

3x lexical search query performance
Pyvespa Relevance Evaluator
Global-phase rank-score-drop-limit
Compact tensor representation
Agentic and Video example applications

Don’t forget to check out Vespa Voice, our new podcast!

3x Lexical Search Query Performance

Despite AI-powered searching changing how much of the world’s information is discovered and accessed, classical information retrieval techniques are still highly relevant - especially when the strengths of the two approaches are combined (hybrid search). For RAG searching, text indexes can have billions of documents stored across many terabytes of data. Searching through these documents in milliseconds puts pressure on finite disk-, memory-, and CPU resources.

From Vespa 8.473, we have added tunable parameters that triple the performance of natural language text search with only a marginal loss of query result quality. These optimizations:

Reduce the required precision for very common words, significantly reducing disk reading and memory caching overhead.
Automatically filter out very common words based on statistics in the indexed data.
Significantly reduce the number of internal result candidates and thus reduce ranking costs.

See tuning query performance for lexical search for how to use stopword-limit, adjust-target and filter-threshold, and read the blog post for details.

Pyvespa Relevance Evaluator

Since pyvespa 0.54, you can use VespaEvaluator to more easily measure relevance in result sets. It:

Iterates over queries and issues them against your Vespa application
Retrieves top-k documents per query (with k = max of your IR metrics)
Compares the retrieved documents with a set of relevant document IDs
Computes IR metrics: Accuracy@k, Precision@k, Recall@k, MRR@k, NDCG@k, MAP@k
Logs Vespa search times for each query
Logs/returns these metrics
Optionally writes out to CSV

Try the commerce-product-ranking sample application for a practical example of VespaEvaluator.

Global-phase rank-score-drop-limit

The rank-score-drop-limit is used to discard candidate hits under a threshold value. Since Vespa 8.480, this is supported in the global ranking phase, and can be set in configuration or as a query parameter. The global rank phase is the final rank phase, often used with computationally intensive models. Find more details in the reference documentation.

Many thanks to @dainiusjocas for this contribution!

Compact Tensor Representation

Tensor fields are used in documents, queries, and constants. Tensors are potentially large, so using a compact representation can improve performance. Since Vespa 8.475, you can use the compact hex form for mixed (sparse-dense) tensors - for example:

"mixedtensor": {
    "type":"tensor<float>(tag{},x[3])",
    "blocks":{
        "foo":[0.1111111119389534,0.2222222238779068,0.3333333432674408],
        "bar":[0.4444444477558136,0.5555555820465088,0.6666666865348816],
        "baz":[0.7777777910232544,0.8888888955116272,1.0]
    }
}

can be written as:

"mixedtensor": {
    "foo": "3DE38E393E638E393EAAAAAB",
    "bar": "3EE38E393F0E38E43F2AAAAB",
    "baz": "3F471C723F638E393F800000"
}

UPDATE 2025-06-23: Updated links. You can find more details and examples in the query API reference, result reference, and the tensor reference.

New examples and notebooks:

Video Search and Retrieval with Vespa and TwelveLabs is a notebook showcasing the use of TwelveLabs’ state-of-the-art generation and embedding models for video processing. It demonstrates how to generate rich metadata (including summaries and keywords) for videos and embed video chunks for efficient retrieval. Use the example linked above to search for segments inside videos by describing the scene in text.
Agentic streamlit chatbot is a simple example of building an agentic application on top of Vespa.

New posts from our blog

Events

Artificial Intelligence in Financial Services Conference Nordics, Copenhagen March 26: Kristian Aune Is Your Data Ready for GenAI?
BARC DATA festival, Munich March 26-27: Marcin Marzec (Intro2M) and Piotr Kobziakowski: Fighting Hate Speech with AI-Powered Mobile Technologies
Generative AI Summit, London March 31 - April 2: Piotr Kobziakowski: Architecting a Unified AI Engine: Scaling Retrieval, Ranking & Production AI
AI5050, Brussels April 3: Piotr Kobziakowski: RAG and data retrieval beyond words
Haystack, Charlottesville April 23: Kristian Aune: Building Relevance Formulas with LLMs

👉 Follow us on LinkedIn to stay in the loop on upcoming events, blog posts, and announcements.

Thanks for joining us in exploring the frontiers of AI with Vespa. Ready to take your projects to the next level? Deploy your application for free on Vespa Cloud today.

Scaling Smarter: Vespa's Approach to High-Performance Data Management

Balancing Performance and Cost: A Guide to Optimizing Node Size in Vespa

Elasticsearch vs. Vespa Resource Web

newsletter

« Introducing Vespa Voice — Your Signal for What’s Next in AI-Driven Search Infrastructure Introducing Document Enrichment with Large Language Models in Vespa »

Vespa Blog