Develop
Troubleshooting
Check if the index exists
You can check if the index exists by querying the pg_indexes
table.
For example, if you created an index on the books
table, you might expect to see an entry in the pg_indexes
table similar to below
sql
SELECT indexdef FROM pg_indexes WHERE indexdef LIKE '%USING lantern_hnsw%';
indexdef
---------------------------------------------------------------------------------------------------------------------------------
CREATE INDEX books_embedding_idx ON public.books hnsw (embedding dist_cos_ops) WITH (ef_construction='128', ef='64', m='10', dim='1024')
Check if you are using the index
You can use EXPLAIN
to validate if you are using the index.
For example, this query is using a sequential scan. This means that an index is not being used to make search faster, and the query performs an exact search over all rows.
sql
EXPLAIN (COSTS FALSE) SELECT 1 FROM books WHERE v <-> '{0,0,0}';
QUERY PLAN
-----------------------------------
Limit
-> Sort
Sort Key: (v <-> '{1,0,0}'::real[])
-> Seq Scan on books
In contrast, this query uses an index.
sql
EXPLAIN (COSTS FALSE) SELECT 1 FROM books ORDER BY v <-> '{1,0,0}' LIMIT 1;
QUERY PLAN
---------------------------------------------------------
Limit
-> Index Scan using books_book_embedding_idx on books
Order By: (v <-> '{1,0,0}'::real[])
Why is my query not using an index?
Postgres estimated that a table scan would be faster
When determining how to perform a query, Postgres makes an estimate of various execution plans. Especially for small tables, it is possible that Postgres estimated that a sequential scan would be more efficient than an index scan. This estimate is not 100% accurate, and in general we expect index scans to perform faster than sequential scans, even for smaller tables.
To avoid this, you can instruct Postgres to avoid sequential scans whenever possible by setting the enable_seqscan
parameter to FALSE
. See below for an example, and see notes on Postgres configuration for more details on how to set this parameter.
sql
SET enable_seqscan=FALSE;
You can validate that this uses an index
sql
EXPLAIN (COSTS FALSE) SELECT 1 FROM books order by v <-> '{1,0,0}' LIMIT 1;
QUERY PLAN
---------------------------------------------------------
Limit
-> Index Scan using books_book_embedding_idx on books
Order By: (v <-> '{1,0,0}'::real[])
Your query was written incorrectly
If after disabling sequential scans, you still find that your query does not use the index, it may be the case that the operator was used incorrectly.
Here are examples of SQL queries that will use the index
sql
-- Directly orders by the result of the distance calculation
SELECT v <-> ARRAY[0,0,0] FROM books ORDER BY v <-> ARRAY[0,0,0];
-- Orders by the first column (result of the distance calculation)
SELECT v <-> ARRAY[0,0,0] FROM books ORDER BY 1;
-- Orders by the first column (result of the distance calculation)
SELECT v <-> ARRAY[0,0,0] AS distance FROM books ORDER BY 1;
-- Orders by an alias of the distance calculation
SELECT v <-> ARRAY[0,0,0] AS distance FROM books ORDER BY distance;
Here are examples of SQL queries that will not use the index.
sql
-- Using an alias in ORDER BY with a mathematical expression
SELECT v <-> ARRAY[0,0,0] AS distance FROM books ORDER BY distance * 2;
-- Using an alias in ORDER BY with a function
SELECT v <-> ARRAY[0,0,0] AS distance FROM books ORDER BY sqrt(distance);
-- Using a complex expression in ORDER BY
SELECT v <-> ARRAY[0,0,0] FROM books ORDER BY v <-> ARRAY[1,1,1] + 2;
-- Using a non-trivial expression involving multiple columns
SELECT (v <-> ARRAY[0,0,0]) * (x <-> ARRAY[0,0,0]) FROM books ORDER BY 1;
-- Using a CASE statement in ORDER BY
SELECT v <-> ARRAY[0,0,0] AS distance FROM books ORDER BY CASE WHEN distance > 5 THEN 1 ELSE 0 END;
-- Using a subquery in ORDER BY
SELECT v <-> ARRAY[0,0,0] AS distance FROM books ORDER BY (SELECT MAX(distance) FROM books);
In the second group of examples, complexity is introduced by mathematical expressions and functions that hinder the query planner's ability to recognize the correlation between the index and the distance calculation.
If you are having this issue, try simplifying your query. Feel free to contact us at support@lantern.dev for additional help.