All Stories

Stateful model serving: how we accelerate inference using ONNX Runtime

There's a difference between stateless and stateful model serving.

Fine-tuning a BERT model for search applications

How to ensure training and serving encoding compatibility.

From research to production: scaling a state-of-the-art machine learning system

How we implemented a production-ready question-answering application and reduced response time by more than two orders of magnitude.

Fine-tuning a BERT model with transformers

Photo by Samule Sun on Unsplash

Fine-tuning a BERT model with transformers

Setup a custom Dataset, fine-tune BERT with Transformers Trainer and export the model via ONNX.

Vespa Product Updates, October 2020

Photo by Ilya Pavlov on Unsplash

Vespa Product Updates, October 2020

Improvement to Vespa feeding APIs

Vespa Product Updates, September 2020

Introducing ONNX-Runtime, Hamming Distance Metric, Conditional Update Performance Improvements and Compressed Transaction Log with Synced Ack

Efficient open-domain question-answering on

In this post, we reproduce the state-of-the-art baseline for retrieval-based question-answering systems within a single, scalable production ready application on

Vespa Product Updates, August 2020

Introducing NLP with Transformers, Grafana how-to, Improved GEO Search Support, Query Profile Variants Optimizations, & Build on Debian 10

Vespa Product Updates, June 2020

Announcing support for approximate nearest neighbor vector search which can be combined with filters and text search with state-of-the art performance

Introducing NLP with Transformers on Vespa

We’ve been working a lot lately on evaluating Transformer models in Vespa. Here we show how and share a bit on how we view the benefits of inference in Vespa....

Approximate Nearest Neighbor Search in Vespa - Part 1

In this blog post we explore how the Vespa team selected HNSW (Hierarchical Navigable Small World Graphs) as the baseline approximate nearest neighbor algorithm for extension and integration in Vespa....

The hardest problem in computing

What is the hardest problem in applied computing? My bet is on big data serving — computing over large data sets online.