Vespa Newsletter, December 2021
In the previous update, we mentioned schema inheritance, improved data dump performance, “true” query item, faster deployments and Hamming distance for ranking. This time, we have the following updates:
Tensor performance improvements
Since Vespa 7.507.67, Euclidian distance calculations using int8 are 250% faster, using HW-accelerated instructions. This speeds up feeding to HSNW-based indices, and reduces latency for nearest neighbor queries. This is relevant for applications with large data sets per node - using int8 instead of float uses 4x less memory, and the performance improvement is measured to bring us to 10k puts/node when using HSNW.
With Vespa 7.514.11, tensor field memory alignment for types <= 16 bytes is optimized. E.g. a 104 bit = 13 bytes int8 tensor field will be aligned at 16 bytes, previously 32, a 2x improvement. Query latency might improve too, due to less memory bandwidth used.
Refer for #20073 Representing SPANN with Vespa for details on this work, and also see Bringing the neural search paradigm shift to production from the London Information Retrieval Meetup Group.
Any Vespa rank feature or function output can be returned along with regular document fields by adding it to the list of summary-features of the rank profile. If a feature is both used for ranking and returned with results, it is re-calculated by Vespa when fetching the document data of the final result as this happens after the global merge of matched and scored documents. This can be wasteful when these features are the output of complex functions such as a neural language model.
The new match-features allows you to configure features that are returned from content nodes as part of the document information returned before merging the global list of matches. This avoids re-calculating such features for serving results and makes it possible to use them as inputs to a (third) re-ranking evaluated over the globally best ranking hits. Furthermore, calculating match-features is also part of the multi-threaded per-matching-and-ranking execution on the content nodes, while features fetched with summary-features are single-threaded.