Vespa Newsletter, April 2025
In the previous update, we mentioned Python query API, Vespa Logstash Connectors, ModernBERT models, and Vespa CLI multi-get.
Today, we’re excited to share the following updates:
- 3x lexical search query performance
- Pyvespa Relevance Evaluator
- Global-phase rank-score-drop-limit
- Compact tensor representation
- Agentic and Video example applications
Don’t forget to check out Vespa Voice, our new podcast!
3x Lexical Search Query Performance
Despite AI-powered searching changing how much of the world’s information is discovered and accessed, classical information retrieval techniques are still highly relevant - especially when the strengths of the two approaches are combined (hybrid search). For RAG searching, text indexes can have billions of documents stored across many terabytes of data. Searching through these documents in milliseconds puts pressure on finite disk-, memory-, and CPU resources.
From Vespa 8.473, we have added tunable parameters that triple the performance of natural language text search with only a marginal loss of query result quality. These optimizations:
- Reduce the required precision for very common words, significantly reducing disk reading and memory caching overhead.
- Automatically filter out very common words based on statistics in the indexed data.
- Significantly reduce the number of internal result candidates and thus reduce ranking costs.
See tuning query performance for lexical search for how to use stopword-limit, adjust-target and filter-threshold, and read the blog post for details.
Pyvespa Relevance Evaluator
Since pyvespa 0.54, you can use VespaEvaluator to more easily measure relevance in result sets. It:
- Iterates over queries and issues them against your Vespa application
- Retrieves top-k documents per query (with k = max of your IR metrics)
- Compares the retrieved documents with a set of relevant document IDs
- Computes IR metrics: Accuracy@k, Precision@k, Recall@k, MRR@k, NDCG@k, MAP@k
- Logs Vespa search times for each query
- Logs/returns these metrics
- Optionally writes out to CSV
Try the commerce-product-ranking sample application for a practical example of VespaEvaluator.
Global-phase rank-score-drop-limit
The rank-score-drop-limit is used to discard candidate hits under a threshold value. Since Vespa 8.480, this is supported in the global ranking phase, and can be set in configuration or as a query parameter. The global rank phase is the final rank phase, often used with computationally intensive models. Find more details in the reference documentation.
Many thanks to @dainiusjocas for this contribution!
Compact Tensor Representation
Tensor fields are used in documents, queries, and constants. Tensors are potentially large, so using a compact representation can improve performance. Since Vespa 8.475, you can use the compact hex form for mixed (sparse-dense) tensors - for example:
"mixedtensor": {
"type":"tensor<float>(tag{},x[3])",
"blocks":{
"foo":[0.1111111119389534,0.2222222238779068,0.3333333432674408],
"bar":[0.4444444477558136,0.5555555820465088,0.6666666865348816],
"baz":[0.7777777910232544,0.8888888955116272,1.0]
}
}
can be written as:
"mixedtensor": {
"foo": "3DE38E393E638E393EAAAAAB",
"bar": "3EE38E393F0E38E43F2AAAAB",
"baz": "3F471C723F638E393F800000"
}
You can find more details and examples in the reference documentation and the tensor reference.
New examples and notebooks:
- Video Search and Retrieval with Vespa and TwelveLabs is a notebook showcasing the use of TwelveLabs’ state-of-the-art generation and embedding models for video processing. It demonstrates how to generate rich metadata (including summaries and keywords) for videos and embed video chunks for efficient retrieval. Use the example linked above to search for segments inside videos by describing the scene in text.
- Agentic streamlit chatbot is a simple example of building an agentic application on top of Vespa.
New posts from our blog
- Introducing Vespa Voice — Your Signal for What’s Next in AI-Driven Search Infrastructure
- Tripling the query performance of lexical search
- AI in Insurance with Vespa.ai
- Advanced Video Retrieval at Scale: A Quick Start Using Vespa and TwelveLabs
Events
- Artificial Intelligence in Financial Services Conference Nordics, Copenhagen March 26: Kristian Aune Is Your Data Ready for GenAI?
- BARC DATA festival, Munich March 26-27: Marcin Marzec (Intro2M) and Piotr Kobziakowski: Fighting Hate Speech with AI-Powered Mobile Technologies
- Generative AI Summit, London March 31 - April 2: Piotr Kobziakowski: Architecting a Unified AI Engine: Scaling Retrieval, Ranking & Production AI
- AI5050, Brussels April 3: Piotr Kobziakowski: RAG and data retrieval beyond words
- Haystack, Charlottesville April 23: Kristian Aune: Building Relevance Formulas with LLMs
👉 Follow us on LinkedIn to stay in the loop on upcoming events, blog posts, and announcements.
Thanks for joining us in exploring the frontiers of AI with Vespa. Ready to take your projects to the next level? Deploy your application for free on Vespa Cloud today.