Develop
Weighted Vector Search
Motivation
Consider a table products
with columns name
and description
. You may want to prioritize products where the search query matches the name
column over the description
column. In this case, you can perform a weighted vector search, with a greater weight for the name
column.
Overview
Weighted vector search enables searching over multiple vector columns in a single query. It calculates a joint distance metric based on the weighted distance between the vectors in the table and the provided vectors. It allows simultaneously searching over up to 3 columns and ordering rows according to weighted vector distance of the query to each of the three columns.
The advantage of the function is that it allows using the vector index on each of the columns, while regular weighting via an SQL query would not allow that.
API
The API of the function is below:
lantern.weighted_vector_search(
relation_type anyelement, -- Type of the specific table containing all the vectors
w1 numeric, -- Weight of the first vector
col1 text, -- Column name of the first vector
vec1 vector, -- Query vector constant for the first column vector
w2 numeric = 0, -- Weight of the second vector
col2 text = NULL, -- Column name of the second vector (optional)
vec2 vector = NULL, -- Query vector constant for the second column vector (optional)
w3 numeric = 0, -- Weight of the third vector
col3 text = NULL, -- Column name of the third vector (optional)
vec3 vector = NULL, -- Query vector constant for the third column vector (optional)
ef integer = 100, -- Number of trees to search in the index
max_dist numeric = NULL, -- Maximum weighted distance to keep
id_col text = 'id', -- Column name of the id column
exact boolean = false, -- If true, use exact search instead of approximate
debug_output boolean = false, -- If true, print debug output
analyze_output boolean = false -- If true, print analyze output
) RETURNS TABLE
The function returns a table of the same type as the input relation, containing only the relevant rows. It calculates the weighted distance between the query vectors and the vectors in the table, and filters the results based on the max_dist
parameter if specified. If max_dist
is not specified, it returns the total number of rows if the total number of rows is less than ef
. Otherwise, it returns between ef
and ef * number of columns
results.
Example Usage
Let's start by creating an example table with text and vector columns to demonstrate the usage of the lantern.weighted_vector_search
function.
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
category TEXT,
embedding VECTOR(1536)
);
INSERT INTO products (name, description, category, embedding)
VALUES
('Wrench Set', 'A set of wrenches for tightening and loosening nuts and bolts.', 'Tools', /* vector */),
('Shower Curtain', 'A waterproof curtain for the shower or bathtub.', 'Bathroom', /* vector */),
('Lawn Mower', 'A gas-powered mower for cutting grass.', 'Outdoor', /* vector */),
('Screwdriver Set', 'A set of screwdrivers with different head types.', 'Tools', /* vector */);
In this example, we have a products table with columns for name, description, category, and an embedding column of type VECTOR(1536)
to store vector representations of the product data.
To perform a weighted vector search on this table, we can use the lantern.weighted_vector_search
function. Let's say we want to find products related to cars, with a focus on the product description and category.
SELECT name, description, category
FROM lantern.weighted_vector_search(
CAST(NULL AS products),
max_dist => 5.0,
w1 => 2.0, col1 => 'description', vec1 => /* vector describing car environment s*/,
w2 => 1.0, col2 => 'category', vec2 => /* vector describing car environments */
);
In this example, we search for products with a maximum weighted distance of 5.0 from the query vector. We're giving a higher weight (2.0) to the description column and a lower weight (1.0) to the category column. The vec1 and vec2 parameters should be replaced with the actual vector representations of the word "plumbing" for the respective columns.
This query will return the name, description, and category of products that are most relevant to the query vector, with a higher emphasis on the product description.
We can control which column is more important for our query by adjusting the weights.
To query the table without a max_dist
and return at most 5 results, we can use the following query:
SELECT name, description, category
FROM lantern.weighted_vector_search(
CAST(NULL AS products),
w1 => 2.0, col1 => 'description', vec1 => /* vector describing car environment s*/,
w2 => 1.0, col2 => 'category', vec2 => /* vector describing car environments */
)
LIMIT 5;