Vespa Newsletter, January 2022
In the previous update, we mentioned Tensor performance improvements, Match features and the Vespa IntelliJ plugin. Today, we’re excited to share the following updates:
Faster node recovery and re-balancing
When Vespa content nodes are added or removed, data is auto-migrated between nodes to maintain the configured data distribution. The throughput of this migration is throttled to avoid impact to regular query and write traffic. We have worked to improve this throughput by using available resources better, and since November we have been able to approximately double it - read the blog post.
Most schema changes in Vespa are effected immediately, but some require re-indexing. Reindexing the corpus can take time, and consumes resources. It is now possible to configure how fast to re-index in order to balance this tradeoff, see reindex speed. Read more about schema changes.
pyvespa 0.14.0 is released with the following changes:
- Add retry strategy to delete_data, get_data and update_data (#222).
- Deployment parameter disk_folder defaults to the current working directory for both Docker and Cloud deployments (#225).
- Vespa connection now accepts cert and key as separate arguments. Using both certificate and key values in the cert file continue to work as before (#226).
Improved support for Weak And and unstructured user input
You can now use
type=weakAnd in the Query API.
Used with userInput,
it is easy to create a query using weakAnd
with unstructured input data in a query, for a better relevance / performance tradeoff compared to all / any queries.
Semantic Rules have added better support for making synonym expansion rules through the * operator, see #20386, and proper stemming in multiple languages, see Semantic Rules directives. Read more about query rewriting.
If no language is explicitly set in a document or a query, and stemming/nlp tokenization is used, Vespa will run a language detector on the available text. Since Vespa 7.518.53, the default has changed from Optimaize to OpenNLP. Read more.
New blog posts
- ML model serving at scale is about model serving latency and concurrency, and is a great primer on inference threads, intra-operation threads and inter-operation threads.
- Billion-scale knn part two goes in detail on tensor vector precision types, memory usage, precision and performance for both nearest neighbor and approximate nearest neighbor search. Also learn how HNSW works with number of links in the graph and neighbors to explore at insert time, and how this affects precision.