Jon Bratseth
Jon Bratseth

Marqo chooses Vespa


The vector database Marqo announces that they have chosen to wrap Vespa after extensive benchmarking:

For Marqo 2, we looked at a number of open source and proprietary vector databases, including Milvus, Vespa, OpenSearch (AWS managed), Weaviate, Redis, and Qdrant.

Vespa came out as the clear winner:

Our internal benchmarks revealed that Vespa excelled as the optimal choice, satisfying all the criteria mentioned above, including being highly performant. For example, with 50M vectors, Vespa had a P50 latency of 16ms vs 140ms for Milvus for an infrastructure identical in cost.

and compared to their previous OpenSearch backed version:

this configurability was invaluable in reducing Marqo’s latency by more than half and increasing its throughput by up to 2x, compared to Marqo 1.

The post contains some great insights into what people overlook when benchmarking:

Published benchmarks frequently fail to answer many questions that are critical in choosing a vector database for high throughput production vector search at very large scales. For instance:

  • In production use cases, it is common to encounter tens or even hundreds of millions of vectors. The performance of vector databases for such large numbers of vectors is a critical consideration. Most benchmarks, however, tend to focus on relatively small datasets.
  • How does the vector database perform under concurrent indexing and search, as well as ongoing mutation of indexed documents? Excellent search performance despite high throughput indexing and updates is a requirement for business-critical use cases.
  • How do different vector databases compare in terms of space complexity and memory efficiency?
  • What does it take to have a highly available deployment that can meet strict SLAs despite node failures?

Benchmarking is hard, but when done properly, Vespa tends to come out on top.

Read the full post at Marqo’s blog.