Thiago Martins
Thiago Martins
Vespa Data Scientist

Run search engine experiments in Vespa from python

Three ways to get started with pyvespa.

pyvespa provides a python API to Vespa. The library’s primary goal is to allow for faster prototyping and facilitate Machine Learning experiments for Vespa applications.

There are three ways you can get value out of pyvespa:

  1. You can connect to a running Vespa application.

  2. You can build and deploy a Vespa application using pyvespa API.

  3. You can deploy an application from Vespa config files stored on disk.

We will review each of those methods.

Decorative image

Photo by Kristin Hillery on Unsplash

Connect to a running Vespa application

In case you already have a Vespa application running somewhere, you can directly instantiate the Vespa class with the appropriate endpoint. The example below connects to the cord19.vespa.ai application:

1
2
3
from vespa.application import Vespa

app = Vespa(url = "https://api.cord19.vespa.ai")

We are then good to go and ready to interact with the application through pyvespa:

1
2
3
4
5
6
7
8
9
app.query(body = {
  'yql': 'select title from sources * where userQuery();',
  'hits': 1,
  'summary': 'short',
  'timeout': '1.0s',
  'query': 'coronavirus temperature sensitivity',
  'type': 'all',
  'ranking': 'default'
}).hits
1
2
3
4
[{'id': 'index:content/1/ad8f0a6204288c0d497399a2',
  'relevance': 0.36920467353113595,
  'source': 'content',
  'fields': {'title': '<hi>Temperature</hi> <hi>Sensitivity</hi>: A Potential Method for the Generation of Vaccines against the Avian <hi>Coronavirus</hi> Infectious Bronchitis Virus'}}]

Build and deploy with pyvespa API

You can also build your Vespa application from scratch using the pyvespa API. Here is a simple example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from vespa.package import ApplicationPackage, Field, RankProfile

app_package = ApplicationPackage(name = "sampleapp")
app_package.schema.add_fields(
    Field(
        name="title", 
        type="string", 
        indexing=["index", "summary"], 
        index="enable-bm25")
)
app_package.schema.add_rank_profile(
    RankProfile(
        name="bm25", 
        inherits="default", 
        first_phase="bm25(title)"
    )
)

We can then deploy app_package to a Docker container (or directly to VespaCloud):

1
2
3
4
5
6
7
8
from vespa.package import VespaDocker

vespa_docker = VespaDocker(
    disk_folder="/Users/username/sample_app", # chose your own absolute folder
    container_memory="8G",
    port=8080
)
app = vespa_docker.deploy(application_package=app_package)
1
2
3
4
5
6
7
8
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for application status.
Waiting for application status.
Finished deployment.

app holds an instance of the Vespa class just like our first example, and we can use it to feed and query the application just deployed. We can also go to the Vespa configuration files stored in the disk_folder, modify them and deploy them directly from the disk using the method discussed in the next section. This can be useful when we want to fine-tune our application based on Vespa features not available through the pyvespa API.

There is also the possibility to explicitly export app_package to Vespa configuration files (without deploying them) through the export_application_package method:

1
vespa_docker.export_application_package(application_package=app_package)

Deploy from Vespa config files

pyvespa API provides a subset of the functionality available in Vespa. The reason is that pyvespa is meant to be used as an experimentation tool for Information Retrieval (IR) and not for building production-ready applications. So, the python API expands based on the needs we have to replicate common use cases that often require IR experimentation.

If your application requires functionality or fine-tuning not available in pyvespa, you simply build it directly through Vespa configuration files as shown in many examples on Vespa docs. But even in this case, you can still get value out of pyvespa by deploying it from python based on the Vespa configuration files stored on disk. To show that, we can clone and deploy the news search app covered in this Vespa tutorial:

1
!git clone https://github.com/vespa-engine/sample-apps.git

The Vespa configuration files of the news search app are stored in the sample-apps/news/app-3-searching/ folder:

1
!tree sample-apps/news/app-3-searching/
1
2
3
4
5
6
7
sample-apps/news/app-3-searching/
├── hosts.xml
├── schemas
│   └── news.sd
└── services.xml

1 directory, 3 files

We can then deploy to a Docker container from disk:

1
2
3
4
5
6
7
8
from vespa.package import VespaDocker

vespa_docker_news = VespaDocker(
    disk_folder="/Users/username/sample-apps/news/app-3-searching/", # Docker requires absolute path
    container_memory="8G", 
    port=8081
)
app = vespa_docker_news.deploy_from_disk(application_name="news")
1
2
3
4
5
6
7
8
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for application status.
Waiting for application status.
Finished deployment.

Again, app holds an instance of the Vespa class just like our first example, and we can use it to feed and query the application just deployed.

Final thoughts

We covered three different ways to connect to a Vespa application from python using the pyvespa library. Those methods provide great workflow flexibility. They allow you to quickly get started with pyvespa experimentation while enabling you to modify Vespa config files to include features not available in the pyvespa API without losing the ability to experiment with the added features.