Kristian Aune
Kristian Aune
Head of Customer Success,

Vespa Newsletter, August 2023

In the previous update, we mentioned Vector Streaming Search, Embedder Models from Huggingface, GPU Acceleration of Embedding Models, Model Hub and Dotproduct distance metric for ANN. Today, we’re excited to share the following updates:

Multilingual sample app

In the previous newsletter, we announced Vespa E5 model support. Now we’ve added a multilingual-search sample application. Using Vespa’s powerful indexing language and integrated embedding support, you can embed and index:

field embedding type tensor<float>(x[384]) {
    indexing {
        "passage: " . input title . " " . input text | embed | attribute

Likewise, for queries:

    "yql": "select ..",
    "input.query(q)": "embed(query: the query to encode)",

With this, you can easily use multilingual E5 for great relevance, see the simplify search with multilingual embeddings blog post for results. Remember to try the sample app, using trec_eval to compute NDCG@10.

ANN targetHits

Vespa uses targetHits in approximate nearest neighbor queries. When searching the HNSW index in a post-filtering case, this is auto-adjusted in an effort to still expose targetHits hits to first-phase ranking after post-filtering (by exploring more nodes). This increases query latency as more candidates are evaluated. Since Vespa 8.215, the following formula is used to ensure an upper bound of adjustedTargetHits:

adjustedTargetHits = min(targetHits / estimatedHitRatio,
                         targetHits * targetHitsMaxAdjustmentFactor)

You can use this to choose to return fewer hits over taking longer to search the index. The target-hits-max-adjustment-factor can be set in a rank profile and overridden per query. The value is in the range [1.0, inf], default 20.0.

Tensor short query format in inputs

In Vespa 8.217, a short format for mapped tensors can be used in input values. Together with the short indexed tensor format, query tensors can be like:

"input": {
    "query(my_indexed_tensor)": [1, 2, 3, 4],
    "query(my_mapped_tensor)": {
        "Tablet Keyboard Cases": 0.8,


During the last month, we’ve released PyVespa 0.35, 0.36 and 0.37:

  • Requires minimum Python 3.8.
  • Support setting default stemming of Schema: #510.
  • Add support for first phase ranking: #512.
  • Support using key/cert pair generated by Vespa CLI: #513 and add deploy_from_disk for Vespa Cloud: #514 - this makes it easier to interoperate with Vespa Cloud and local experiments.
  • Specify match-features in RankProfile: #525.
  • Add utility to create a vespa feed file for easier feeding using Vespa CLI: #536.
  • Add support for synthetic fields: #547 and support for Component config: #548. With this, one can run the multivector sample application - try it using the multi-vector-indexing notebook.

Vespa CLI functions

The Vespa command-line client has been made smarter, it will now check local deployments (e.g. on your laptop) and wait for the container cluster(s) to be up:

$ vespa deploy
Waiting up to 1m0s for deploy API to become ready...
Uploading application package... done

Success: Deployed . with session ID 2
Waiting up to 1m0s for deployment to converge...
Waiting up to 1m0s for cluster discovery...
Waiting up to 1m0s for container default...

The new function vespa destroy is built for quick dev cycles on Vespa Cloud. When developing, easily reset the state in your Vespa Cloud application by calling vespa destroy. This is also great for automation, e.g., in a GitHub Action. Local deployments should reset with fresh Docker/Podman containers.

Optimizations and features

  • Vespa indexing language now supports to_epoch_second for converting iso-8601 date strings to epoch time. Available since Vespa 8.215. Use this to easily convert from strings to a number when indexing - see example.
  • Since Vespa 8.218, Vespa uses onnxruntime 1.15.1.
  • Since Vespa 8.218, one can use create to create non-existing cells before a modify-update operation is applied to a tensor.
  • Vespa allows referring to models by URL in the application package. Such files can be large, and are downloaded per deploy-operation. Since 8.217, Vespa will use a previously downloaded model file if it exists on the requesting node. New versions of the model must use a different URL.
  • Some Vespa topologies use groups of nodes to optimize query performance - each group has a replica of a document. High-query Vespa applications might have tens or even hundreds of groups. Upgrading such clusters in Vespa Cloud takes time, having only one replica (= group) out at any time. With groups-allowed-down-ratio, one can set a percentage of groups instead, say 25%, for only 4 cycles to upgrade a full content cluster.

Blog posts since last newsletter

Thanks for reading! Try out Vespa on Vespa Cloud or grab the latest release at and run it yourself! 😀