Kristian Aune
Kristian Aune
Head of Customer Success, Vespa.ai

Vespa Newsletter, August 2024

In the previous update, we mentioned RAG in Vespa, cheaper vector search, fuzzy search with prefix match, distance calculation performance improvements, and new Pyvespa features. Today, we’re excited to share the following updates:

  • Pyvespa improvements
  • Vespa CLI improvements
  • Performance: Improved multi-threading performance with text matching
  • New Vespa features: Chinese segmentation, improved English stemming, and new ranking features.

Pyvespa improvements

ColPali is a method that will transform search and RAG for visual documents, such as PDFs (typically containing figures and tables). Our very own @jobergum demonstrated how to use ColPali with Vespa in the Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models notebook - also see the blog post on PDF Retrieval with Vision Language Models.

Features and fixes:

  • Support for deploying applications to Vespa Cloud production with Pyvespa. Use deploy_to_prod to start deployment of a new application package revision (typically automatically triggered by a build job) to Vespa Cloud. You can also use check_production_build_status for deployment tracking.

  • Key/cert-generation for mTLS-auth is now generated using Vespa CLI. This reduces the discrepancy in the authentication method between Pyvespa and Vespa CLI. This was previously done separately in Pyvespa, which could cause certificate mismatches in some cases.

  • Interactive control-plane auth. By adopting interactive auth (opening auth link in browser) from Vespa CLI, it is now a lot easier to interact with Vespa Cloud from Python. Check out the updated Quickstart on Vespa Cloud-notebook, also see authenticating-to-vespa-cloud.

  • Switch the VespaAsync HTTP-client to use httpx[http2]. As Vespa supports HTTP/2, this enables the Async Pyvespa client to multiplex HTTP requests over a single connection.

  • app.feed_async_iterable(), with a similar signature as the sync feed_iterable(), using the async client. The feed_async_iterable typically performs better (while using less resources) than its synchronous counterpart, especially in cases where network latency is larger. For details, check out this notebook.

  • Bugfix: Pyvespa version 0.45 had a bug that resulted in ImportError: No module named termios for Windows users. This is now fixed. Note that interactive login is not yet supported on Windows. We have also implemented a cross-platform matrix for unit tests, to catch platform-dependent errors earlier in the future.

For those interested in full release details, check out github.com/vespa-engine/pyvespa/releases. We also encourage the community to keep creating issues, whether it is enhancements or bug fixes / other ideas. We would also like to thank the following external contributors for contributing since the last newsletter - you rock!

Vespa CLI improvements

  • vespa log now supports self-hosted Vespa instances (from Vespa 8.359).
  • vespa deploy now detects and warns if the certificate added to the application package does not match the configured application key pair.
  • vespa deploy now supports a .vespaignore file which allows excluding unwanted files from the deployed application package. See the documentation for more details.
  • vespa query now handles large tokens (> 64K) when streaming responses from an LLM.
  • vespa feed now supports sending custom headers using the new –header option.
  • vespa feed performance increased by 27% when feeding large documents (> 10K).
  • vespa document get now supports a --field-set option (like vespa visit) that specifies which fields to include when retrieving a document. See the documentation for more details.

Performance

The default query operator in Vespa is weakand, and Vespa lets you control how many cores to use to execute each query. The weakand operator now uses a shared heap across threads used in the matching phase. This has reduced CPU usage and latency on text/hybrid queries. In a sample performance test measuring theoretical perfect resource utilization, we saw an increase of 37.5%. The specific improvements you’ll see depend on your data and queries - we recommend you use the latest Vespa release and try it yourself! Changing the number of search threads only requires a content node restart, done automatically when running on Vespa Cloud.

New Vespa features

  • Ranking: To eliminate low-scoring hits from later ranking phases you can use rank-score-drop-limit in the first ranking phase. Since Vespa 8.354, rank-score-drop-limit is also available in the second rank phase. This can be set in the ranking profile or use the ranking.secondphase.rankscoredroplimit Query API parameter.
  • Ranking: When writing ranking functions, you can pass the names of features as function arguments. From Vespa 8.371, you can also do the same with dimension names.
  • Linguistics: Since 8.379, Vespa supports Chinese segmentation in the default linguistics implementation. To enable this, add this config to your <container> elements(s) in services.xml :
      <container id="default" version="1.0"><config name="ai.vespa.opennlp.open-nlp">
                  <cjk>true</cjk>
                  <createCjkGrams>true</createCjkGrams>
              </config>
    

    Note that if you change this on a live field, you will reduce recall until reindexing is completed.

  • Linguistics: In the default Vespa linguistics implementation, the stemmer for English is an implementation called kStem, while other languages use Snowball. Since 8.388, Vespa lets you switch to Snowball for English as well, by setting the configuration below. This will cause more words to be stemmed in English, and therefore higher recall.
      <container id="default" version="1.0"><config name="ai.vespa.opennlp.open-nlp">
              <snowballStemmingForEnglish>true</snowballStemmingForEnglish>
          </config>
    

    Note that if you change this on a live field you will reduce recall until reindexing is completed - also see the documentation.

  • Operations: As Clusters grow in size and features, application owners continuously resize and reconfigure for performance and const optimizations. Vespa’s elasticity functions makes this easy with auto data migration to new nodes under regular query load. This also means that clusters are often redistributing data. Using the new cluster-controller_cluster-buckets-out-of-sync-ratio metric makes it easy to know the status and interpolate when the redistribution is complete.

We highly recommend taking a look at the blog posts we have published since the last newsletter

Events

Meet us at the MLCon in New York City, October 8-9 and COLLIDE DATA CONFERENCE in Atlanta, October 10-11!


Thanks for joining us in exploring the frontiers of AI with Vespa. Ready to take your projects to the next level? Deploy your application for free on Vespa Cloud today.