Kristian Aune
Kristian Aune
Head of Customer Success, Vespa.ai

Vespa Newsletter, October 2022

In the previous update, we mentioned Rank-phase statistics, Schema feeding flexibility, the Query Builder and Trace Visualizer, Rank trace profiling, the new --speed-test parameter and a new video. Today, we’re excited to share the following updates:

Create vector embeddings in Vespa without custom Java code

An increasingly popular reason for using Vespa is the ability to use vector embeddings to be able to retrieve documents by semantic similarity in addition to retrieving by text tokens or attributes. Since Vespa 8.52, we have made this easier by making it possible to use BERT-style models to create document and query embeddings inside Vespa without writing any custom code.

The BertBase embedder bundled with Vespa uses a WordPiece embedder to produce a token sequence that is then input to a transformer model. A BERT-Base compatible transformer model must have three inputs:

  • A token sequence (input_ids)
  • An attention mask (attention_mask)
  • (Optionally) Token types for cross encoding (token_type_ids)

Give this a try at simple-semantic-search.

Model hub: Provided ML models on Vespa Cloud

The BERT base embedder allows you to use vector search without bringing your own vectors, or writing any Java code - but you still have to bring the model. For our Vespa Cloud users we have made this even simpler by providing the models out of the platform as well.

For us working on Vespa.ai, it is always a goal to empower application developers by making it as simple as possible to get started, while at the same time being able to scale seamlessly to more data, higher traffic, and more complex use cases. So of course you can still bring your own models, write your own embedders, or pass in your own vectors, and mix and match all these capabilities in all possible ways.

Improved query performance for filters with common terms

When making text indexes, Vespa stores a bitvector in addition to the posting list for frequent terms to enable maximally fast matching. If the field is used as a filter only, no ranking is needed, and the bitvector will be used instead of the posting list. This makes queries using such terms faster and cheaper. The bitvector optimization is now also available for attribute fields with fast-search.

Paged attributes

Fields which are stored in column stores suitable for random memory access are called attributes in Vespa. These are used for matching, ranking and grouping, and enabling high-throughput partial updates. By default, attributes are stored completely in memory to make all accesses maximally fast, but some have also supported paging out to disk to support a wider range of tradeoffs between lookup speed and memory cost - see e.g. hybrid billion scale vector search.

Since Vespa 8.69, paging support has been extended to all attribute types, except tensor with fast-rank and predicate.

ARM64 support

Vespa container images are now released as multiplatform, supporting both x86_64 and ARM64. ARM64 is also available on Vespa Cloud. Read more.

Query result highlighting for arrays of string

Highlighting query words in results helps users see why a particular document is returned in their search result. Since Vespa 8.53, this is supported for arrays of string in addition to single-value strings - see the schema reference.

Vespa scripts are becoming go only

The Vespa Container image was set on a diet and now has zero Perl-dependencies. Most Vespa utilities have now instead been ported to using Go to support a wider range of client platforms without requiring any dependencies.