Managed Vector Search using Vespa Cloud

by Jo Kristian Bergum, Vespa Solutions Architect


Photo by israel palacio on Unsplash

There is a growing interest in AI-powered vector representations of unstructured multimodal data and searching efficiently over these representations. This blog post describes how your organization can unlock the full potential of multimodal AI-powered vector representations using Vespa – the industry-leading open-source big data serving engine.


Deep Learning has revolutionized information extraction from unstructured data like text, audio, image, and videos. Furthermore, self-supervised learning algorithms like data2vec accelerate learning representations of speech, vision, text, and multimodal representations combining these modalities. Pre-training deep neural network models using self-supervised learning without expensive curated labeled data helps scale machine learning as adoption and fine-tuning for a specific task requires fewer labeled examples.

Representing unstructured multimodal data as vectors or tensors unlocks new and exciting use cases it wasn’t easy to foresee just a few years ago. Even a well-established AI-powered use case like search ranking, which has been using AI to improve the search results for decades, is going through a neural paradigm shift driven by language models like BERT.

These emerging multimodal data-to-vector models increase the insight and knowledge organizations can extract from their unstructured data. As a result, organizations leveraging this new data paradigm will have a significant competitive advantage over organizations not participating in this paradigm shift. Learning from structured and unstructured data has historically primarily been performed offline. However, advanced organizations with access to modern infrastructure and competence have started transferring the learning process to onstage, using real-time, in-session contextual features to improve AI predictions.

One example of real-time online inference or prediction is within-cart recommendation systems, where grocery and e-commerce sites recommend or predict related items to supplement the user’s current cart contents. An AI-powered recommendation model for this use case could use item-to-item similarity or past sparse user-to-item interactions. Still, without a doubt, using the real-time context, in this case, the cart’s contents,
can improve the model’s accuracy. Furthermore, creating add-to-cart suggestions for all possible combinations offline is impossible due to the combinatoric explosion of likely cart items. This use case also has the challenging property that the number of things to choose from is extensive, hundreds of millions in the case of Amazon. In addition, business constraints like in-stock status limit the candidate selection.

Building technology and infrastructure to perform computationally complex distributed AI inference over billions of data items with low user-time serving latency constraints is one of the most challenging problems in computing.

Vespa - Serving Engine

Vespa, the open-source big data serving engine, specializes in making it easy for an any-sized organization to move AI inference computations online at scale without investing a significant amount of resources in building infrastructure and technology. Vespa is a distributed computation engine that can scale in any dimension.

In Vespa, AI is a first-class citizen and not an after-thought. The following Vespa primitives are the foundational building blocks for building an online AI serving engine:

Get Started Today with Vector Search using Vespa Cloud.

We have created a getting started with Vector Search sample application which, in a few steps, shows you how to deploy your Vector search use case to Vespa Cloud. Check it out at

The sample application features:

For only $3,36 per hour, your organization can store and search 5M 768 dimensional vectors, deployed in Vespa Cloud production zones with high availability, supporting thousands of inserts and queries per second.

Vespa Cloud Console. Snapshot while auto-scaling of stateless container cluster in progress.

Vespa Cloud Console. Concurrent real-time indexing of vectors while searching. Scale as needed to meet any low latency serving use case.

With this vector search sample application, you have a great starting point for implementing your vector search use case, without worrying about managing complex infrastructure. See also other Vespa sample applications using vector search:

These are examples of applications built using AI-powered vector representations.

Vespa is available as a cloud service; see Vespa Cloud - getting started, or self-serve Vespa - getting started.

01 Jul 2022