Kristian Aune
Kristian Aune
Head of Customer Success, Vespa.ai

Vespa Newsletter, September 2024

In the previous update, we mentioned Pyvespa and Vespa CLI improvements, Improved multi-threading performance with text matching, Chinese segmentation, improved English stemming, and new ranking features. Today, we’re excited to share the following updates:

  • Optimized MaxSim with Hamming distance for multivector documents
  • Pyvespa
  • IDE support

Optimized MaxSim with Hamming distance for multivector documents

The Vespa Tensor Ranking framework lets users write arbitrarily complex ranking functions. Multi-vector MaxSim is increasingly important for many use cases, see scaling-colpali-to-billions for an example:

model max_sim {
  inputs {
    query(qt) tensor<float>(q{}, v[128])
  }
  function max_sim(query, page) {
    expression {
      sum(
        reduce(
          sum(
            query * page , v
          ),
          max,
          p
        ),
      q
      )
    }
  }
  first-phase {
    expression: max_sim(query(qt), attribute(embedding))
  }
}

From Vespa 8.404, this operation is optimized with Hamming distance using int8, to support bit-resolution embeddings - this is a 32x reduction in memory usage compared to floats. Tests show a 30x reduction in latency:

maxsim latency graph

In Vespa, the Multivector MaxSim similarity is a dot product between all the query token embeddings and all the chunk/patch embeddings, followed by a max reduce operation over the chunk/patch dimension, followed by a sum reduce operation over the query tokens. Read more in #32232 Optimize MaxSim with hamming (sum of max inverted hamming distances).

Pyvespa

Pyvespa 0.49 has been released, highlights:

We have also revamped and added to the notebooks, particularly new ones about ColPali! We also recommend the original paper at ColPali: Efficient Document Retrieval with Vision Language Models.

For those interested in full release details, check out github.com/vespa-engine/pyvespa/releases. Thanks to these external contributors for adding to Pyvespa since the last newsletter!

IDE support

Vespa provides plugins for working with schemas and rank profiles in IDE’s:

  • VSCode: VS Code extension
  • IntelliJ, PyCharm or WebStorm: Jetbrains plugin
  • Vim: neovim

ide-example

See the documentation for details, and read the blog post for how our 2024 summer interns created them!

Other

From Vespa 8.411, Vespa uses ONNX Runtime 1.19.2

New posts from our blog

Other companies blogging about how and why they build on Vespa

  • From Vinted Engineering by Ernestas Poškus: Vinted Search Scaling Chapter 8: Goodbye Elasticsearch, Hello Vespa Search Engine. “The migration was a roaring success. We managed to cut the number of servers we use in half (down to 60). The consistency of search results has improved since we’re now using just one deployment (or cluster, in Vespa terms) to handle all traffic. Search latency has improved by 2.5x and indexing latency by 3x. The time it takes for a change to be visible in search has dropped from 300 seconds (Elasticsearch’s refresh interval) to just 5 seconds. Our search traffic is stable, the query load is deterministic, and we’re ready to scale even further.”
  • Guest blog post by Yuhong Sun, CoFounder/CoCEO Danswer: Why Danswer - the biggest open source project in Enterprise Search - uses Vespa. “We chose Vespa because of its richness of features, the amazing team behind it, and their commitment to staying up to date on every innovation in the search and NLP space. We look forward to the exciting features that the Vespa team is building and are excited to finalize our own migration to Vespa Cloud.”

Events


Thanks for joining us in exploring the frontiers of AI with Vespa. Ready to take your projects to the next level? Deploy your application for free on Vespa Cloud today.