Quick Start with Logstash: from data to Vespa schema
If you want to get started with Vespa, check out our getting started guides. They are based on the sample apps, which provide good inspiration for your own use-cases.
But what if you already have some data that you want to write to Vespa?
This is where Logstash comes in. Its Output plugin for Vespa now has a detect_schema
mode that can generate a Vespa application package from your data. The application package contains all the configuration required for Vespa to run: from the number of nodes and machine learning models to the schema.
In this tutorial, we’ll go through the fastest way to get your data into Vespa, whether you’re running Vespa locally (e.g., with Docker/Podman/etc) or using Vespa Cloud. Either way, the high-level steps are the same:
- Download Logstash.
- Install the Vespa Output plugin.
- Configure Logstash to use the
detect_schema
mode. - Upload the generated application package to Vespa.
- Disable
detect_schema
and re-run Logstash to write your data.
Let’s get into the specifics.
Logstash to local Vespa
The easiest way to get started is to download a zip/tgz archive from the Logstash download page. You can also install Logstash using your package manager or run it as a container.
Once it’s unpacked, install the Vespa Output plugin by running:
bin/logstash-plugin install logstash-output-vespa_feed
The config file will depend on your data. Have a look at this 5-recipe blog post for some inspiration. For now, let’s just read JSON documents from standard input, as an example.
# read JSON documents from standard input
input {
stdin {
codec => json
}
}
# remove fields that are not part of our JSON documents
filter {
mutate {
remove_field => ["@timestamp", "@version", "event", "host", "log", "message"]
}
}
output {
# uncomment to print to stdout, for debugging
# stdout {
# codec => rubydebug
# }
vespa_feed {
# this will generate a Vespa application package, instead of feeding documents
detect_schema => true
# make Logstash deploy the application package to Vespa as well
deploy_package => true
}
}
Now, assuming Vespa is running locally with something like:
podman run --detach --name vespa-container --hostname vespa-container \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa
You can run Logstash and send a sample document to it:
echo '{"id": "1", "title": "Hello, world!"}' | bin/logstash -f config.conf
This will generate a Vespa application package and deploy it to your local container. At this point, you can disable detect_schema
and re-run Logstash in exactly the same way to write your data to Vespa.
echo '{"id": "1", "title": "Hello, world!"}' | bin/logstash -f config.conf
Now you’re ready to profit (i.e., query):
curl -XPOST -H "Content-Type: application/json" -d\
'{ "yql": "select * from sources * where true"}'\
'http://localhost:8080/search/' | jq .
Once you’ve satisfied the initial thirst, you can go back to the deployed application package and iterate on it. The schema documentation and our IDE plugins should help you along the way.
To deploy a new iteration of the application package, you’ll need the Vespa CLI. With it, you can do:
# The --wait flag shows the deployment progress. Otherwise, you'll have to look in the logs.
vespa deploy --wait 900
Logstash prints the path to the generated application package when it deploys it. If you lost that output, Vespa CLI to the rescue:
vespa fetch /download/path
Speaking of the Vespa CLI, you’ll need it for Vespa Cloud as well.
Logstash to Vespa Cloud
With a Vespa Cloud account created, you’ll need to create a tenant and an application. Then, in your Logstash config, under the output
section, add those details:
# the `input` and `filter` sections are the same as for local Vespa
output {
vespa_feed {
# Vespa Cloud details
vespa_cloud_tenant => "your-tenant"
vespa_cloud_application => "your-application"
### same options as for local Vespa
# this will generate a Vespa application package, instead of feeding documents
detect_schema => true
# make Logstash deploy the application package to Vespa as well
deploy_package => true
}
}
When you run Logstash (with the same bin/logstash -f config.conf
command as before), there are two differences. First is that Logstash will, by default, generate mTLS certificates and copy them to .vespa
under your home directory. You can do this manually, too, by running vespa auth cert
.
Secondly, the application package won’t be automatically deployed. Instead, you’ll see four commands to copy-paste:
- Point Vespa CLI to Vespa Cloud:
vespa config set target cloud
- Point it to your tenant and application:
vespa config set application YOUR_TENANT.YOUR_APPLICATION.default
. Where “default” is the default instance name that you can change when you create the application. Adjustvespa_cloud_instance
in the Logstash config if that’s the case. - Authenticate your Vespa CLI to your Vespa Cloud account:
vespa auth login
- Deploy the application package:
vespa deploy --wait 900
Once you’ve deployed the application package, you can disable detect_schema
and re-run Logstash in exactly the same way as for local Vespa. Logstash will automatically set the mTLS certificates to those of the generated application package. If you need to change them, modify the client_cert
and client_key
options in the Vespa output of your Logstash config. Check out the full list of options in the Logstash Output plugin for Vespa README.
Happy hacking! Oh, and feel free to reach out on LinkedIn, X or Slack if you have any questions!