Lantern Extras
Generate Embeddings
The Lantern Extras Postgres extension enables generating embeddings using SQL with the functions text_embedding and image_embedding.
Note that generating embeddings is a CPU-intensive task and large scale embedding generation processes. For large scale embedding generation, the Lantern CLI provides a separate process.
Run Embedding Generation
To generate one-off text embeddings, use the text_embedding function. For example, to generate an embedding for the text My text input using the embedding model BAAI/bge-small-en, run
SELECT text_embedding('BAAI/bge-small-en', 'My text input');To generate image embeddings, use the image_embedding function. For example, to generate an embedding for the image https://lantern.dev/images/home/footer.png using the embedding model clip/ViT-B-32-visual, run
SELECT image_embedding('clip/ViT-B-32-visual', 'https://lantern.dev/images/home/footer.png');The above mentioned functions will use local models using ort runtime.
If you want to generate embeddings using OpenAI or Cohere APIs you can use the following functions:
SET lantern_extras.openai_token='xxxxxxxxxxxxx';
SET lantern_extras.openai_azure_api_token='xxxxxxxxxxxxx'; -- For Azure deployment with API Key authentication
SET lantern_extras.openai_azure_entra_token='xxxxxxxxxxxxx'; -- For Azure deployment with Microsoft Entra ID authentication
SET lantern_extras.openai_deployment_url='https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2023-05-15' -- You can set this GUC or pass via arguments
SELECT openai_embedding('openai/text-embedding-ada-002', 'My text input');
SELECT openai_embedding('openai/text-embedding-ada-002', 'My text input', 'https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2023-05-15'); -- Use azure deployment
SELECT openai_embedding('openai/text-embedding-v3-small', 'My text input', '', 768); -- Provide dimensions for new models
SELECT openai_embedding('openai/text-embedding-v3-large', 'My text input', '', 3072); -- Provide dimensions for new modelsFor more info about azure_api_token and azure_entra_token variables refer to Azure Docs
Cohere embeddings
SET lantern_extras.cohere_token='xxxxxxxxxxxxx';
SELECT cohere_embedding('cohere/embed-multilingual-v3.0 ', 'My text input');
SELECT cohere_embedding('cohere/embed-multilingual-v3.0 ', 'My text input', 'search_query'); -- This is the default type for embedding. Use this when doing queries
SELECT cohere_embedding('cohere/embed-multilingual-v3.0 ', 'My text input', 'search_document'); -- This is type for embedding when you want to store in databaseFor more info about embedding type refer to Cohere Docs
Supported Models
The following embedding models are currently supported:
Model Name | Dimensions | Max Tokens | Data Type | Runtime |
|---|---|---|---|---|
| 512 | 224 | Image | ort |
| 512 | 77 | Text | ort |
| 768 | 128 | Text | ort |
| 384 | 128 | Text | ort |
| 768 | 250 | Text | ort |
| 768 | 128 | Text | ort |
| 1024 | 128 | Text | ort |
| 1024 | 512 | Text | ort |
| 768 | 512 | Text | ort |
| 1024 | 512 | Text | ort |
| 384 | 512 | Text | ort |
| 768 | 512 | Text | ort |
| 1024 | 512 | Text | ort |
| 1024 | 8192 | Text | ort |
| 512 | 8192 | Text | ort |
| 768 | 8192 | Text | ort |
| 1536 | 8192 | Text | openai |
| 512 - 1536 | 8192 | Text | openai |
| 256 - 3072 | 8192 | Text | openai |
| 1024 | 512 | Text | cohere |
| 1024 | 512 | Text | cohere |
| 4096 | 512 | Text | cohere |
| 1024 | 512 | Text | cohere |
| 768 | 512 | Text | cohere |
| 384 | 512 | Text | cohere |
| 384 | 512 | Text | cohere |