Vespa Product Updates, May 2019: Deploy Large Machine Learning Models, Multithreaded Disk Index Fusion, Ideal State Optimizations, and Feeding Improvements

In last month’s Vespa update, we mentioned Tensor updates, Query tracing and coverage. Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to evolve.

For May, we’re excited to share the following feature updates with you:

Multithreaded disk index fusion

Content nodes are now able to sustain a higher feed rate by using multiple threads for disk index fusion. Read more.

Feeding improvements

Cluster-internal communications are now multithreaded out of the box, for high throughput feeding operations. This fully utilizes a 10 Gbps network and improves utilization of high-CPU content nodes.

Ideal state optimizations

Whenever the content cluster state changes, the ideal state is calculated. This is now optimized (faster and runs less often) and state transitions like node up/down will have less impact on read and write operations. Learn more in the dynamic data distribution documentation.

Download ML models during deploy

One procedure for using/importing ML models to Vespa is to put them in the application package in the models directory. Applications where models are trained frequently in some external system can refer to the model by URL rather than including it in the application package. This use case is now documented in deploying remote models, and solves the challenge of deploying huge models.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.

« Vespa use case: shopping Vespa Product Updates, August 2019: BM25 Rank Feature, Searchable Parent References, Tensor Summary Features, and Metrics Export »

Vespa Blog

Vespa Product Updates, May 2019: Deploy Large Machine Learning Models, Multithreaded Disk Index Fusion, Ideal State Optimizations, and Feeding Improvements