Hands-On RAG guide for personal data with Vespa and LLamaIndex
Photo by Avi Richards on Unsplash
This blog post is a hands-on RAG tutorial demonstrating how to use Vespa streaming mode for cost-efficient retrieval of personal data. You can read more about Vespa streaming search in these two blog posts:
- Announcing vector streaming search: AI assistants at scale without breaking the bank
- Yahoo Mail turns to Vespa to do RAG at scale
This blog post is also available as a runnable notebook where you can have this app up and running on Vespa Cloud in minutes ( )
The blog post covers:
- Configuring Vespa and using Vespa streaming mode with PyVespa.
- Using Vespa native built-in embedders in combination with streaming mode.
- Ranking in Vespa, including hybrid retrieval and ranking methods, freshness (recency) features, and Vespa Rank Fusion.
- Query federation and blending retrieved results from multiple sources/schemas.
- Connecting LLamaIndex retrievers with a Vespa app to build generative AI pipelines.
TLDR; Vespa streaming mode
Vespa’s streaming search solution lets you make the user a part of the document ID so that Vespa can use it to co-locate the data of each user on a small set of nodes and the same chunk of disk. Streaming mode allows searching over a user’s data with low latency without keeping any user’s data in memory or paying the cost of managing indexes.
- There is no accuracy drop for vector search as it uses exact vector search
- Several orders of magnitude higher write throughput (No expensive index builds to support approximate search)
- Documents (including vector data) are 100% disk-based, significantly reducing deployment cost
- Queries are restricted to content by the user ID/(groupname)
Storage cost is the primary cost driver of Vespa streaming mode; no data is in memory. Avoiding memory usage lowers deployment costs significantly. For example, Vespa Cloud allows storing streaming mode data at below 0.30$ per GB/month. Yes, that is per month.
Getting started with LLamaIndex and PyVespa
The focus is on using the streaming mode feature in combination with multiple Vespa schemas; in our case, we imagine building RAG over personal mail and calendar data, allowing effortless query federation and blending of the results from multiple data sources for a given user.
First, we must install dependencies:
! pip3 install pyvespa llama-index
Synthetic Mail & Calendar Data
There are few public email datasets because people care about their privacy, so this notebook uses synthetic data to examine how to use Vespa streaming mode.
We create two generator functions that return Python dict
s with synthetic mail and calendar data.
Notice that the dict has three keys:
id
groupname
fields
This is the expected feed format for PyVespa feed operations and
where PyVespa will use these to build a Vespa document v1 API request(s).
The groupname
key is only relevant with streaming mode.
from typing import List
def synthetic_mail_data_generator() -> List[dict]:
synthetic_mails = [
{
"id": 1,
"groupname": "bergum@vespa.ai",
"fields": {
"subject": "LlamaIndex news, 2023-11-14",
"to": "bergum@vespa.ai",
"body": """Hello Llama Friends 🦙 LlamaIndex is 1 year old this week! 🎉 To celebrate, we're taking a stroll down memory
lane on our blog with twelve milestones from our first year. Be sure to check it out.""",
"from": "news@llamaindex.ai",
"display_date": "2023-11-15T09:00:00Z"
}
},
{
"id": 2,
"groupname": "bergum@vespa.ai",
"fields": {
"subject": "Dentist Appointment Reminder",
"to": "bergum@vespa.ai",
"body": "Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist",
"from": "dentist@dentist.no",
"display_date": "2023-11-15T15:30:00Z"
}
},
{
"id": 1,
"groupname": "giraffe@wildlife.ai",
"fields": {
"subject": "Wildlife Update: Giraffe Edition",
"to": "giraffe@wildlife.ai",
"body": "Dear Wildlife Enthusiasts 🦒, We're thrilled to share the latest insights into giraffe behavior in the wild. Join us on an adventure as we explore their natural habitat and learn more about these majestic creatures.",
"from": "updates@wildlife.ai",
"display_date": "2023-11-12T14:30:00Z"
}
},
{
"id": 1,
"groupname": "penguin@antarctica.ai",
"fields": {
"subject": "Antarctica Expedition: Penguin Chronicles",
"to": "penguin@antarctica.ai",
"body": "Greetings Explorers 🐧, Our team is embarking on an exciting expedition to Antarctica to study penguin colonies. Stay tuned for live updates and behind-the-scenes footage as we dive into the world of these fascinating birds.",
"from": "expedition@antarctica.ai",
"display_date": "2023-11-11T11:45:00Z"
}
},
{
"id": 1,
"groupname": "space@exploration.ai",
"fields": {
"subject": "Space Exploration News: November Edition",
"to": "space@exploration.ai",
"body": "Hello Space Enthusiasts 🚀, Join us as we highlight the latest discoveries and breakthroughs in space exploration. From distant galaxies to new technologies, there's a lot to explore!",
"from": "news@exploration.ai",
"display_date": "2023-11-01T16:20:00Z"
}
},
{
"id": 1,
"groupname": "ocean@discovery.ai",
"fields": {
"subject": "Ocean Discovery: Hidden Treasures Unveiled",
"to": "ocean@discovery.ai",
"body": "Dear Ocean Explorers 🌊, Dive deep into the secrets of the ocean with our latest discoveries. From undiscovered species to underwater landscapes, our team is uncovering the wonders of the deep blue.",
"from": "discovery@ocean.ai",
"display_date": "2023-10-01T10:15:00Z"
}
}
]
for mail in synthetic_mails:
yield mail
calendar
Similarily, for calendar data
from typing import List
def synthetic_calendar_data_generator() -> List[dict]:
calendar_data = [
{
"id": 1,
"groupname": "bergum@vespa.ai",
"fields": {
"subject": "Dentist Appointment",
"to": "bergum@vespa.ai",
"body": "Dentist appointment at 2023-12-04 at 09:30 - 1 hour duration",
"from": "dentist@dentist.no",
"display_date": "2023-11-15T15:30:00Z",
"duration": 60,
}
},
{
"id": 2,
"groupname": "bergum@vespa.ai",
"fields": {
"subject": "Public Cloud Platform Events",
"to": "bergum@vespa.ai",
"body": "The cloud team continues to push new features and improvements to the platform. Join us for a live demo of the latest updates",
"from": "public-cloud-platform-events",
"display_date": "2023-11-21T09:30:00Z",
"duration": 60,
}
}
]
for event in calendar_data:
yield event
Definining a Vespa application
PyVespa helps us build the Vespa application package. A Vespa application package comprises configuration files, code (plugins), and models.
We define two Vespa schemas for our mail and calendar data. PyVespa
offers a programmatic API for creating the schema. Ultimately, the programmatic representation is serialized to files (<schema-name>.sd
).
In the following we define the fields and their type. Note that we set mode
to streaming
,
which enables Vespa streaming mode for this schema.
Other valid modes are indexed
and store-only
.
mail schema
from vespa.package import Schema, Document, Field, FieldSet, HNSW
mail_schema = Schema(
name="mail",
mode="streaming",
document=Document(
fields=[
Field(name="id", type="string", indexing=["summary", "index"]),
Field(name="subject", type="string", indexing=["summary", "index"]),
Field(name="to", type="string", indexing=["summary", "index"]),
Field(name="from", type="string", indexing=["summary", "index"]),
Field(name="body", type="string", indexing=["summary", "index"]),
Field(name="display_date", type="string", indexing=["summary"]),
Field(name="timestamp", type="long", indexing=["input display_date", "to_epoch_second", "summary", "attribute"], is_document_field=False),
Field(name="embedding", type="tensor<bfloat16>(x[384])",
indexing=["\"passage: \" . input subject .\" \". input body", "embed e5", "attribute", "index"],
ann=HNSW(distance_metric="angular"),
is_document_field=False
)
],
),
fieldsets=[
FieldSet(name = "default", fields = ["subject", "body", "to", "from"])
]
)
In the mail
schema, we have six document fields; these are provided by us when we feed documents of type mail
to this app.
The fieldset defines
which fields are matched against when we do not mention explicit field names when querying. We can add as many fieldsets as we like without duplicating content.
In addition to the fields within the document
, there are two synthetic fields in the schema, timestamp
, and embedding
,
using Vespa indexing expressions
taking inputs from the document and performing conversions.
- the
timestamp
field takes the inputdisplay_date
and uses the to_epoch_second converter converter to convert the display date into an epoch timestamp. This is useful because we can calculate the document’s age and use thefreshness(timestamp)
rank feature during ranking phases. - the
embedding
tensor field takes the subject and body as input. It feeds that into an embed function that uses an embedding model to map the string input into an embedding vector representation using 384-dimensions withbfloat16
precision. Vectors in Vespa are represented as Tensors.
calendar schema
from vespa.package import Schema, Document, Field, FieldSet, HNSW
calendar_schema = Schema(
name="calendar",
inherits="mail",
mode="streaming",
document=Document(inherits="mail",
fields=[
Field(name="duration", type="int", indexing=["summary", "index"]),
Field(name="guests", type="array<string>", indexing=["summary", "index"]),
Field(name="location", type="string", indexing=["summary", "index"]),
Field(name="url", type="string", indexing=["summary", "index"]),
Field(name="address", type="string", indexing=["summary", "index"])
]
)
)
The calendar
schema inherits
from the mail
schema, meaning we don’t have to define the embedding
field for the
calendar
schema.
Configuring embedders
The observant reader might have noticed the e5
argument to the embed
expression in the above mail
schema embedding
field.
The e5
argument references a component of the type hugging-face-embedder. In this
example, we use the e5-small-v2 text embedding model that maps text to 384-dimensional vectors.
from vespa.package import ApplicationPackage, Component, Parameter
vespa_app_name = "assistant"
vespa_application_package = ApplicationPackage(
name=vespa_app_name,
schema=[mail_schema, calendar_schema],
components=[Component(id="e5", type="hugging-face-embedder",
parameters=[
Parameter("transformer-model", {"url": "https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx"}),
Parameter("tokenizer-model", {"url": "https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json"})
]
)]
)
We share and reuse the same embedding model for both schemas. Note that embedding inference is resource-intensive.
Ranking
In the last step of configuring the Vespa app, we add ranking profiles by adding rank-profile
’s to the schemas. Vespa supports phased ranking and has a rich set of built-in rank-features.
One can also define custom functions with ranking expressions.
from vespa.package import RankProfile, Function, GlobalPhaseRanking, FirstPhaseRanking
keywords_and_freshness = RankProfile(
name="default",
functions=[Function(
name="my_function", expression="nativeRank(subject) + nativeRank(body) + freshness(timestamp)"
)],
first_phase=FirstPhaseRanking(
expression="my_function",
rank_score_drop_limit=0.02
),
match_features=["nativeRank(subject)", "nativeRank(body)", "my_function", "freshness(timestamp)"],
)
semantic = RankProfile(
name="semantic",
functions=[Function(
name="cosine", expression="max(0,cos(distance(field, embedding)))"
)],
inputs=[("query(q)", "tensor<float>(x[384])"), ("query(threshold)","", "0.75")],
first_phase=FirstPhaseRanking(
expression="if(cosine > query(threshold), cosine, -1)",
rank_score_drop_limit=0.1
),
match_features=["cosine", "freshness(timestamp)", "distance(field, embedding)", "query(threshold)"],
)
fusion = RankProfile(
name="fusion",
inherits="semantic",
functions=[
Function(
name="keywords_and_freshness", expression=" nativeRank(subject) + nativeRank(body) + freshness(timestamp)"
),
Function(
name="semantic", expression="cos(distance(field,embedding))"
)
],
inputs=[("query(q)", "tensor<float>(x[384])"), ("query(threshold)", "", "0.75")],
first_phase=FirstPhaseRanking(
expression="if(cosine > query(threshold), cosine, -1)",
rank_score_drop_limit=0.1
),
match_features=["nativeRank(subject)", "keywords_and_freshness", "freshness(timestamp)", "cosine", "query(threshold)"],
global_phase=GlobalPhaseRanking(
rerank_count=1000,
expression="reciprocal_rank_fusion(semantic, keywords_and_freshness)"
)
)
The default
rank profile defines a custom function my_function
that computes a linear combination of three different features:
nativeRank(subject)
Is a text matching feature , scoped to thesubject
field.nativeRank(body)
Same, but scoped to thebody
field.freshness(timestamp)
This is a built-in rank-feature that returns a number close to 1 if the timestamp is recent compared to the current query time.
The semantic
profile defines the query tensor user with nearestNeighbor search and a custom expression in combination
with rank-score-drop-limit that allows for a query time threshold.
The fusion
profile is more involved and uses phased ranking,
where the first-phase uses semantic similarity (cosine), and the best results from
that phase are re-ranked using a global phase expression that performs reciprocal rank fusion. Read more about Vespa RRF and cross-hit normalization.
Serializing from PyVespa object representation to application files
We can serialize the representation to application package files. This is practical when we want to start working with production deployments and when we want to manage the application schema files with version control and safe deployments with CI/CD in Vespa Cloud.
application_directory="my-assistant-vespa-app"
vespa_application_package.to_files(application_directory)
import os
def print_files_in_directory(directory):
for root, _, files in os.walk(directory):
for file in files:
print(os.path.join(root, file))
print_files_in_directory(application_directory)
my-assistant-vespa-app/services.xml
my-assistant-vespa-app/schemas/mail.sd
my-assistant-vespa-app/schemas/calendar.sd
my-assistant-vespa-app/search/query-profiles/default.xml
my-assistant-vespa-app/search/query-profiles/types/root.xml
Deploy the application to Vespa Cloud
With the configured application, we can deploy it to Vespa Cloud. It is also possible to deploy the app using docker; see the Hybrid Search - Quickstart guide for an example of deploying a Vespa app using the vespaengine/vespa container image.
See for complete details on onboarding Vespa Cloud and deployment details.
Feeding data to Vespa
With the app up and running in Vespa Cloud, we can feed and query our data.
We use the feed_iterable API of pyvespa
with a custom callback
that prints the URL and an error if the operation fails. We pass the defined synthetic generators and call feed_iterable
with the specific schema
and namespace
.
from vespa.io import VespaResponse
def callback(response:VespaResponse, id:str):
if not response.is_successful():
print(f"Error {response.url} : {response.get_json()}")
else:
print(f"Success {response.url}")
app.feed_iterable(synthetic_mail_data_generator(), schema="mail", namespace="assistant", callback=callback)
app.feed_iterable(synthetic_calendar_data_generator(), schema="calendar", namespace="assistant", callback=callback)
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/bergum@vespa.ai/1
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/bergum@vespa.ai/2
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/giraffe@wildlife.ai/1
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/penguin@antarctica.ai/1
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/space@exploration.ai/1
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/mail/group/ocean@discovery.ai/1
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/calendar/group/bergum@vespa.ai/1
Success https://cb923ffc.cae25ac9.z.vespa-app.cloud//document/v1/assistant/calendar/group/bergum@vespa.ai/2
Querying data
Now, we can also query our data. With streaming mode,
we must pass the groupname
parameter, or the request will fail with an error. The query request uses the Vespa Query API and the Vespa.query()
function
supports passing any of the Vespa query API parameters.
Sample query request for when is my dentist appointment
for the user bergum@vespa.ai
:
from vespa.io import VespaQueryResponse
import json
response:VespaQueryResponse = app.query(
yql="select subject, display_date, to from sources mail where userQuery()",
query="when is my dentist appointment",
groupname="bergum@vespa.ai",
ranking="default"
)
assert(response.is_successful())
print(json.dumps(response.hits[0], indent=2))
{
"id": "id:assistant:mail:g=bergum@vespa.ai:2",
"relevance": 1.134783932836458,
"source": "assistant_content.mail",
"fields": {
"matchfeatures": {
"freshness(timestamp)": 0.9232458847736625,
"nativeRank(body)": 0.09246780326887034,
"nativeRank(subject)": 0.11907024479392506,
"my_function": 1.134783932836458
},
"subject": "Dentist Appointment Reminder",
"to": "bergum@vespa.ai",
"display_date": "2023-11-15T15:30:00Z"
}
}
For the above query request, Vespa searched the default
fieldset we defined in the schema to match against several fields, including the body and the subject. The default
rank-profile calculated the relevance score as the sum of three rank-features nativeRank(body) + nativeRank(subject) + freshness(timestamp)
. The result of this computation is the relevance
score of the hit.
In addition, we also asked Vespa to return matchfeatures
, that are handy for debugging the final relevance
score
or for feature logging.
Now, we can try the semantic
ranking profile, using Vespa’s support for nearestNeighbor search. This example also demonstrates using the configured e5
embedder to embed the user query
into an embedding representation.
See embedding a query text for more usage examples of using Vespa native embedders.
from vespa.io import VespaQueryResponse
import json
response:VespaQueryResponse = app.query(
yql="select subject, display_date from mail where {targetHits:10}nearestNeighbor(embedding,q)",
groupname="bergum@vespa.ai",
ranking="semantic",
body={
"input.query(q)": "embed(e5, \"when is my dentist appointment\")",
}
)
assert(response.is_successful())
print(json.dumps(response.hits[0], indent=2))
{
"id": "id:assistant:mail:g=bergum@vespa.ai:2",
"relevance": 0.9079386507883569,
"source": "assistant_content.mail",
"fields": {
"matchfeatures": {
"distance(field,embedding)": 0.4324572498488368,
"freshness(timestamp)": 0.9232457561728395,
"query(threshold)": 0.75,
"cosine": 0.9079386507883569
},
"subject": "Dentist Appointment Reminder",
"display_date": "2023-11-15T15:30:00Z"
}
}
LlamaIndex Retrievers Introduction
Now, we have a basic Vespa app using streaming mode. We likely want to use an LLM framework like LangChain or LLamaIndex to build an end-to-end assistant. The LlamaIndex retriever abstraction allows developers to add custom retrievers that retrieve information in Retrieval Augmented Generation (RAG) pipelines. For an excellent introduction to LLamaIndex and its concepts, see LLamaIndex Concepts.
To create a custom LlamaIndex retriever, we implement a class that inherits from llama_index.retrievers.BaseRetriever.BaseRetriever
and
which implements _retrieve(query)
. A simple PersonalAssistantVespaRetriever
could look like the following:
from llama_index.core import BaseRetriever
from llama_index.schema import NodeWithScore, QueryBundle, TextNode
from llama_index.callbacks.base import CallbackManager
from vespa.application import Vespa
from vespa.io import VespaQueryResponse
from typing import List, Union, Optional
class PersonalAssistantVespaRetriever(BaseRetriever):
def __init__(
self,
app: Vespa,
user: str,
hits: int = 5,
vespa_rank_profile: str = "default",
vespa_score_cutoff: float = 0.70,
sources: List[str] = ["mail"],
fields: List[str] = ["subject", "body"],
callback_manager: Optional[CallbackManager] = None
) -> None:
"""Sample Retriever for a personal assistant application.
Args:
param: app: Vespa application object
param: user: user id to retrieve documents for (used for Vespa streaming groupname)
param: hits: number of hits to retrieve from Vespa app
param: vespa_rank_profile: Vespa rank profile to use
param: vespa_score_cutoff: Vespa score cutoff to use during first-phase ranking
param: sources: sources to retrieve documents from
param: fields: fields to retrieve
"""
self.app = app
self.hits = hits
self.user = user
self.vespa_rank_profile = vespa_rank_profile
self.vespa_score_cutoff = vespa_score_cutoff
self.fields = fields
self.summary_fields = ",".join(fields)
self.sources = ",".join(sources)
super().__init__(callback_manager)
def _retrieve(self, query:Union[str,QueryBundle]) -> List[NodeWithScore]:
"""Retrieve documents from Vespa application.
"""
if isinstance(query, QueryBundle):
query = query.query_str
if self.vespa_rank_profile == 'default':
yql:str = f"select {self.summary_fields} from mail where userQuery()"
else:
yql = f"select {self.summary_fields} from sources {self.sources} where {targetHits:10}nearestNeighbor(embedding,q) or userQuery()"
vespa_body_request = {
"yql" : yql,
"query": query,
"hits": self.hits,
"ranking.profile": self.vespa_rank_profile,
"timeout": "1s",
"input.query(threshold)": self.vespa_score_cutoff,
}
if self.vespa_rank_profile != "default":
vespa_body_request["input.query(q)"] = f"embed(e5, \"{query}\")"
with self.app.syncio(connections=1) as session:
response:VespaQueryResponse = session.query(body=vespa_body_request, groupname=self.user)
if not response.is_successful():
raise ValueError(f"Query request failed: {response.status_code}, response payload: {response.get_json()}")
nodes: List[NodeWithScore] = []
for hit in response.hits:
response_fields:dict = hit.get('fields', {})
text: str = ""
for field in response_fields.keys():
if isinstance(response_fields[field], str) and field in self.fields:
text += response_fields[field] + " "
id = hit['id']
#
doc = TextNode(id_=id, text=text,
metadata=response_fields,
)
nodes.append(NodeWithScore(node=doc, score=hit['relevance']))
return nodes
The above defines a PersonalAssistantVespaRetriever
that takes a pyvespa
Vespa
application instance as an argument, plus some.
The YQL
request specifies a hybrid retrieval query that retrieves both using embedding-based retrieval (vector search)
using Vespa’s nearest neighbor search operator in combination with traditional keyword matching.
Running queries with the PersonalAssistantVespaRetriever
We initialize the PersonalAssistantVespaRetriever
for the user bergum@vespa.ai
with the app
defined earlier.
The user
argument maps to the Vespa streaming mode groupname
parameter,
efficiently limiting the search to only a specific user.
retriever = PersonalAssistantVespaRetriever(
app=app,
user="bergum@vespa.ai",
vespa_rank_profile="default"
)
retriever.retrieve("When is my dentist appointment?")
[NodeWithScore(node=TextNode(id_='id:assistant:mail:g=bergum@vespa.ai:2', embedding=None, metadata={'matchfeatures': {'freshness(timestamp)': 0.9232454989711935, 'nativeRank(body)': 0.09246780326887034, 'nativeRank(subject)': 0.11907024479392506, 'my_function': 1.1347835470339889}, 'subject': 'Dentist Appointment Reminder', 'body': 'Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='269fe208f8d43a967dc683e1c9b832b18ddfb0b2efd801ab7e428620c8163021', text='Dentist Appointment Reminder Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=1.1347835470339889),
NodeWithScore(node=TextNode(id_='id:assistant:mail:g=bergum@vespa.ai:1', embedding=None, metadata={'matchfeatures': {'freshness(timestamp)': 0.9202362397119341, 'nativeRank(body)': 0.02919821398130037, 'nativeRank(subject)': 1.3512214436142505e-38, 'my_function': 0.9494344536932345}, 'subject': 'LlamaIndex news, 2023-11-14', 'body': "Hello Llama Friends 🦙 LlamaIndex is 1 year old this week! 🎉 To celebrate, we're taking a stroll down memory \n lane on our blog with twelve milestones from our first year. Be sure to check it out."}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='5e975eaece761d46956c9d301138f29b5c067d3da32fd013bb79c6ee9c033d3d', text="LlamaIndex news, 2023-11-14 Hello Llama Friends 🦙 LlamaIndex is 1 year old this week! 🎉 To celebrate, we're taking a stroll down memory \n lane on our blog with twelve milestones from our first year. Be sure to check it out. ", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9494344536932345)]
We can also try the semantic
profile, which has rank-score-drop functionality, allowing us to have a per-query time score threshold. This will then
also invoke the native Vespa embedder model inside Vespa.
retriever = PersonalAssistantVespaRetriever(
app=app,
user="bergum@vespa.ai",
vespa_rank_profile="semantic",
vespa_score_cutoff=0.6,
hits=20
)
retriever.retrieve("When is my dentist appointment?")
[NodeWithScore(node=TextNode(id_='id:assistant:mail:g=bergum@vespa.ai:2', embedding=None, metadata={'matchfeatures': {'distance(field,embedding)': 0.43945494361938975, 'freshness(timestamp)': 0.9232453703703704, 'query(threshold)': 0.6, 'cosine': 0.9049836898369259}, 'subject': 'Dentist Appointment Reminder', 'body': 'Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='e89f669e6c9cf64ab6a856d9857915481396e2aa84154951327cd889c23f7c4f', text='Dentist Appointment Reminder Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9049836898369259),
NodeWithScore(node=TextNode(id_='id:assistant:mail:g=bergum@vespa.ai:1', embedding=None, metadata={'matchfeatures': {'distance(field,embedding)': 0.69930099954744, 'freshness(timestamp)': 0.9202361111111111, 'query(threshold)': 0.6, 'cosine': 0.7652923088511814}, 'subject': 'LlamaIndex news, 2023-11-14', 'body': "Hello Llama Friends 🦙 LlamaIndex is 1 year old this week! 🎉 To celebrate, we're taking a stroll down memory \n lane on our blog with twelve milestones from our first year. Be sure to check it out."}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='cb9b588e5b53dbdd0fbe6f7aadfa689d84a5bea23239293bd299347ee9ecd853', text="LlamaIndex news, 2023-11-14 Hello Llama Friends 🦙 LlamaIndex is 1 year old this week! 🎉 To celebrate, we're taking a stroll down memory \n lane on our blog with twelve milestones from our first year. Be sure to check it out. ", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.7652923088511814)]
Both profiles return the fields defined with summary
, and the “extra” matchfeatures
that can be used for debugging or feature logging (feedback data used to train ranking models).
Federating and blending from multiple sources
Create a new retriever with sources (mail and calendar data), and rerun the query (The default source was mail
):
retriever = PersonalAssistantVespaRetriever(
app=app,
user="bergum@vespa.ai",
vespa_rank_profile="fusion",
sources=["calendar", "mail"],
vespa_score_cutoff=0.80
)
retriever.retrieve("When is my dentist appointment?")
[NodeWithScore(node=TextNode(id_='id:assistant:calendar:g=bergum@vespa.ai:1', embedding=None, metadata={'matchfeatures': {'freshness(timestamp)': 0.9232447273662552, 'nativeRank(subject)': 0.11907024479392506, 'query(threshold)': 0.8, 'cosine': 0.8872983644178517, 'keywords_and_freshness': 1.1606592237923947}, 'subject': 'Dentist Appointment', 'body': 'Dentist appointment at 2023-12-04 at 09:30 - 1 hour duration'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='b30948011cbe9bbf29135384efbc72f85a6eb65113be0eb9762315a022f11ba1', text='Dentist Appointment Dentist appointment at 2023-12-04 at 09:30 - 1 hour duration ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.03278688524590164),
NodeWithScore(node=TextNode(id_='id:assistant:mail:g=bergum@vespa.ai:2', embedding=None, metadata={'matchfeatures': {'freshness(timestamp)': 0.9232447273662552, 'nativeRank(subject)': 0.11907024479392506, 'query(threshold)': 0.8, 'cosine': 0.9049836898369259, 'keywords_and_freshness': 1.1347827754290507}, 'subject': 'Dentist Appointment Reminder', 'body': 'Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='21c501ccdc6e4b33d388eefa244c5039a0e1ed4b81e4f038916765e22be24705', text='Dentist Appointment Reminder Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.03278688524590164)]
The above query retrieved results from both sources and blended the results from the two sources using the per schema profile scores. At runtime, we can adjust scores per source, depending on, e.g., query context categorization. Another possibility is using generative LLMs to predict which sources to include.
Summary
This tutorial leveraged Vespa’s streaming mode to store and retrieve personal data. Vespa streaming mode is a unique capability, allowing for building highly cost-efficient RAG applications for personal data. Our focus extended to the practical application of custom LLamaIndex retrievers, connecting LLamaIndex seamlessly with a Vespa app to build advanced generative AI pipelines.
The tutorial also demonstrated the seamless blending and federation of query results from multiple data sources (multi-index RAG). We can easily envision adding more sources or schemas, for example, to track chat message history (long-term memory) in the context of a single user, offering a simple and industry-leading cost-efficient way to store and search personal context.
For those eager to learn more about Vespa, join the Vespa community on Slack to exchange ideas, seek assistance, or just stay updated on the latest Vespa developments.