Lantern-cli
Create Index
With the Lantern CLI's create-index
routine, you can create an HNSW index externally to Postgres, without consuming database resources, and import it.
You can read more about how we were able to improve index creation times by 30x over pgvector
in this blog post.
Prerequisites
- Lantern CLI
- Postgres database with the Lantern extension installed
Set Up Data
Note: You can skip this step if you already have vector data in your database
sql
CREATE TABLE embeddings (id SERIAL PRIMARY KEY, v REAL[]);
INSERT INTO embeddings (v) VALUES ('{0,0,0}'), ('{0,1,0}'), ('{1,0,0}');
Run Index Creation
bash
lantern-cli create-index \
--uri 'postgresql://[username]:[password]@localhost:5432/[db]' \
--table "embeddings" \
--column "v" \
-m 10 \
--efc 128 \
--ef 64 \
--metric-kind l2sq \
--out /tmp/index.usearch \
--remote-database
--import
After this the index will be created and imported to your database, even if the database is on remote server!
If you are on the same server that your database is located you can omit --remote-database
option, so it will use the local file from --out
option.
Make sure to provide database uri with superuser
sql
-- verify that index is created properly
SET enable_seqscan=false
SET lantern.pgvector_compat=false
-- you should see index scan on query planner
EXPLAIN SELECT * FROM embeddings WHERE v <?> ARRAY[1,1,1];
Note: External indexes should be reindexed using SELECT lantern_reindex_external_index('<index_name>')
if you have lantern_extras
extension installed or by running the same cli command with --index-name
param specified.
CLI parameters
Run bash lantern-cli create-index --help
to get available CLI parameters
bash
Create external index
Usage: lantern-cli create-index [OPTIONS] --uri <URI> --table <TABLE> --column <COLUMN>
Options:
-u, --uri <URI>
Fully associated database connection string including db name
-s, --schema <SCHEMA>
Schema name [default: public]
-t, --table <TABLE>
Table name
-c, --column <COLUMN>
Column name
-p, --pq
Use already created codebook to create product-quantized binary index
-m <M>
Number of neighbours for each vector [default: 16]
--efc <EFC>
The size of the dynamic list for the nearest neighbors in construction [default: 128]
--ef <EF>
The size of the dynamic list for the nearest neighbors in search [default: 64]
-d <DIMS>
Dimensions of vector [default: 0]
--metric-kind <METRIC_KIND>
Distance algorithm [default: l2sq] [possible values: l2sq, cos, hamming]
-o, --out <OUT>
Index output file [default: index.usearch]
-i, --import
Import index to database (should be run as db superuser to have access)
-r, --remote-database <REMOTE_DATABASE>
If database is not on the same server where the job is running [default: true] [possible values: true, false]
--index-name <INDEX_NAME>
Index name to use when imporrting index to database
-h, --help
Print help