Generate Embeddings

Lantern supports generating text and image embeddings inside the database. Try it out on Lantern Cloud.

Note that generating embeddings is a compute-intensive task. For large scale embedding generation, such as generating embeddings over all of your data, Lantern provides a separate process.

Open AI Text Embeddings

Before using Open AI text embeddings, you need to have an Open AI API key. You can get one by signing up at Open AI. Once you have an API key, set it as a parameter in Postgres.

sql

ALTER ROLE [YOUR_USERNAME] SET lantern_extras.openai_token='[YOUR_API_KEY]';
SELECT pg_reload_conf();

Use the openai_embedding function to generate text embeddings using the Open AI embedding models. This function accepts a model name and text input as arguments, and for the text-embedding-3-small and text-embedding-3-large models, an optional dimension argument.

sql

SELECT openai_embedding('openai/text-embedding-ada-002', 'My text input');
SELECT openai_embedding('openai/text-embedding-3-large', 'My text input');
SELECT openai_embedding('openai/text-embedding-3-large', 'My text input', 256);

The following embedding models are supported

Model Name

Dimensions

Max Tokens

openai/text-embedding-ada-002

1536

8192

openai/text-embedding-3-small

512 - 1536

8192

openai/text-embedding-3-large

256 - 3072

8192

Cohere Text Embeddings

Before using Cohere text embeddings, you need to have a Cohere API key. You can get one by signing up at Cohere. Once you have an API key, set it as a parameter in Postgres.

sql

ALTER ROLE [YOUR_USERNAME] SET lantern_extras.cohere_token='[YOUR_API_KEY]';
SELECT pg_reload_conf();

To generate an embedding for the text My text input using the Cohere embedding model cohere/embed-english-v3.0, run

sql

SELECT cohere_embedding('cohere/embed-english-v3.0', 'My text input');

The following embedding models are supported

Model Name

Dimensions

Max Tokens

cohere/embed-english-v3.0

1024

512

cohere/embed-multilingual-v3.0

1024

512

cohere/embed-english-v2.0

4096

512

cohere/embed-english-light-v2.0

1024

512

cohere/embed-multilingual-v2.0

768

512

cohere/embed-english-light-v3.0

384

512

cohere/embed-multilingual-light-v3.0

384

512

Open-Source Text Embeddings

For example, to generate an embedding for the text My text input using the open-source embedding model BAAI/bge-small-en in SQL, run

sql

SELECT text_embedding('BAAI/bge-small-en', 'My text input');

The following embedding models are supported

Model Name

Dimensions

Max Tokens

clip/ViT-B-32-textual

512

77

microsoft/all-mpnet-base-v2

768

128

microsoft/all-MiniLM-L12-v2

384

128

transformers/multi-qa-mpnet-base-dot-v1

768

250

thenlper/gte-base

768

128

thenlper/gte-large

1024

128

llmrails/ember-v1

1024

512

intfloat/e5-base-v2

768

512

intfloat/e5-large-v2

1024

512

BAAI/bge-small-en

384

512

BAAI/bge-base-en

768

512

BAAI/bge-large-en

1024

512

BAAI/bge-m3

1024

8192

jinaai/jina-embeddings-v2-small-en

512

8192

jinaai/jina-embeddings-v2-base-en

768

8192

Image Embeddings

To generate image embeddings, use the image_embedding function. This function accepts a model name and image URL as arguments.

For example, to generate an embedding for the image https://lantern.dev/images/home/footer.png using the embedding model clip/ViT-B-32-visual, run

sql

SELECT image_embedding('clip/ViT-B-32-visual', 'https://lantern.dev/images/home/footer.png');

The following embedding models are supported

Model Name

Dimensions

Max Tokens

clip/ViT-B-32-visual

512

224

Self-Hosting

For people self-hosting, generating embeddings requires the Lantern Extras extension. Installation steps are found here.

Once the extension is installed, the above functions are available.