Generate Embeddings

The Lantern Extras Postgres extension enables generating embeddings using SQL with the functions text_embedding and image_embedding.

Note that generating embeddings is a CPU-intensive task and large scale embedding generation processes. For large scale embedding generation, the Lantern CLI provides a separate process.

Run Embedding Generation

To generate one-off text embeddings, use the text_embedding function. For example, to generate an embedding for the text My text input using the embedding model BAAI/bge-small-en, run

sql

SELECT text_embedding('BAAI/bge-small-en', 'My text input');

To generate image embeddings, use the image_embedding function. For example, to generate an embedding for the image https://lantern.dev/images/home/footer.png using the embedding model clip/ViT-B-32-visual, run

sql

SELECT image_embedding('clip/ViT-B-32-visual', 'https://lantern.dev/images/home/footer.png');

The above mentioned functions will use local models using ort runtime.

If you want to generate embeddings using OpenAI or Cohere APIs you can use the following functions:

sql

SET lantern_extras.openai_token='xxxxxxxxxxxxx';
SELECT openai_embedding('openai/text-embedding-ada-002', 'My text input');

SET lantern_extras.cohere_token='xxxxxxxxxxxxx';
SELECT cohere_embedding('cohere/embed-multilingual-v3.0 ', 'My text input');

Supported Models

The following embedding models are currently supported:

Model Name

Dimensions

Max Tokens

Data Type

Runtime

clip/ViT-B-32-visual

512

224

Image

ort

clip/ViT-B-32-textual

512

77

Text

ort

microsoft/all-mpnet-base-v2

768

128

Text

ort

microsoft/all-MiniLM-L12-v2

384

128

Text

ort

transformers/multi-qa-mpnet-base-dot-v1

768

250

Text

ort

thenlper/gte-base

768

128

Text

ort

thenlper/gte-large

1024

128

Text

ort

llmrails/ember-v1

1024

512

Text

ort

intfloat/e5-base-v2

768

512

Text

ort

intfloat/e5-large-v2

1024

512

Text

ort

BAAI/bge-small-en

384

512

Text

ort

BAAI/bge-base-en

768

512

Text

ort

BAAI/bge-large-en

1024

512

Text

ort

jinaai/jina-embeddings-v2-small-en

512

8192

Text

ort

jinaai/jina-embeddings-v2-base-en

768

8192

Text

ort

openai/text-embedding-ada-002

1536

8192

Text

openai

openai/text-embedding-3-small

512 - 1536

8192

Text

openai

openai/text-embedding-3-large

256 - 3072

8192

Text

openai

cohere/embed-english-v3.0

1024

512

Text

cohere

cohere/embed-multilingual-v3.0

1024

512

Text

cohere

cohere/embed-english-v2.0

4096

512

Text

cohere

cohere/embed-english-light-v2.0

1024

512

Text

cohere

cohere/embed-multilingual-v2.0

768

512

Text

cohere

cohere/embed-english-light-v3.0

384

512

Text

cohere

cohere/embed-multilingual-light-v3.0

384

512

Text

cohere