Generate Embeddings

The Lantern Extras Postgres extension enables generating embeddings using SQL with the functions text_embedding and image_embedding.

Note that generating embeddings is a CPU-intensive task and large scale embedding generation processes. For large scale embedding generation, the Lantern CLI provides a separate process.

Run Embedding Generation

To generate one-off text embeddings, use the text_embedding function. For example, to generate an embedding for the text My text input using the embedding model BAAI/bge-small-en, run

sql

SELECT text_embedding('BAAI/bge-small-en', 'My text input');

To generate image embeddings, use the image_embedding function. For example, to generate an embedding for the image https://lantern.dev/images/home/footer.png using the embedding model clip/ViT-B-32-visual, run

sql

SELECT image_embedding('clip/ViT-B-32-visual', 'https://lantern.dev/images/home/footer.png');

The above mentioned functions will use local models using ort runtime.

If you want to generate embeddings using OpenAI or Cohere APIs you can use the following functions:

sql

SET lantern_extras.openai_token='xxxxxxxxxxxxx';
SET lantern_extras.openai_azure_api_token='xxxxxxxxxxxxx'; -- For Azure deployment with API Key authentication
SET lantern_extras.openai_azure_entra_token='xxxxxxxxxxxxx'; -- For Azure deployment with Microsoft Entra ID authentication
SET lantern_extras.openai_deployment_url='https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2023-05-15' -- You can set this GUC or pass via arguments

SELECT openai_embedding('openai/text-embedding-ada-002', 'My text input');
SELECT openai_embedding('openai/text-embedding-ada-002', 'My text input', 'https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2023-05-15'); -- Use azure deployment
SELECT openai_embedding('openai/text-embedding-v3-small', 'My text input', '', 768); -- Provide dimensions for new models
SELECT openai_embedding('openai/text-embedding-v3-large', 'My text input', '', 3072); -- Provide dimensions for new models

For more info about azure_api_token and azure_entra_token variables refer to Azure Docs

Cohere embeddings

sql

SET lantern_extras.cohere_token='xxxxxxxxxxxxx';
SELECT cohere_embedding('cohere/embed-multilingual-v3.0 ', 'My text input');
SELECT cohere_embedding('cohere/embed-multilingual-v3.0 ', 'My text input', 'search_query'); -- This is the default type for embedding. Use this when doing queries
SELECT cohere_embedding('cohere/embed-multilingual-v3.0 ', 'My text input', 'search_document'); -- This is type for embedding when you want to store in database

For more info about embedding type refer to Cohere Docs

Supported Models

The following embedding models are currently supported:

Model Name

Dimensions

Max Tokens

Data Type

Runtime

clip/ViT-B-32-visual

512

224

Image

ort

clip/ViT-B-32-textual

512

77

Text

ort

microsoft/all-mpnet-base-v2

768

128

Text

ort

microsoft/all-MiniLM-L12-v2

384

128

Text

ort

transformers/multi-qa-mpnet-base-dot-v1

768

250

Text

ort

thenlper/gte-base

768

128

Text

ort

thenlper/gte-large

1024

128

Text

ort

llmrails/ember-v1

1024

512

Text

ort

intfloat/e5-base-v2

768

512

Text

ort

intfloat/e5-large-v2

1024

512

Text

ort

BAAI/bge-small-en

384

512

Text

ort

BAAI/bge-base-en

768

512

Text

ort

BAAI/bge-large-en

1024

512

Text

ort

BAAI/bge-m3

1024

8192

Text

ort

jinaai/jina-embeddings-v2-small-en

512

8192

Text

ort

jinaai/jina-embeddings-v2-base-en

768

8192

Text

ort

openai/text-embedding-ada-002

1536

8192

Text

openai

openai/text-embedding-3-small

512 - 1536

8192

Text

openai

openai/text-embedding-3-large

256 - 3072

8192

Text

openai

cohere/embed-english-v3.0

1024

512

Text

cohere

cohere/embed-multilingual-v3.0

1024

512

Text

cohere

cohere/embed-english-v2.0

4096

512

Text

cohere

cohere/embed-english-light-v2.0

1024

512

Text

cohere

cohere/embed-multilingual-v2.0

768

512

Text

cohere

cohere/embed-english-light-v3.0

384

512

Text

cohere

cohere/embed-multilingual-light-v3.0

384

512

Text

cohere