LLM Completions in Postgres

With v0.5.0, we’re releasing llm_completion and add_completion_job, which enable LLM calls inside Postgres.

November 20, 2024 · 7 min read

Di Qi

Di Qi

Cofounder

We're excited to release llm_completion and add_completion_job, which enable LLM calls inside Postgres. This enables LLM completion calls for adhoc queries, as well as bulk embedding generation for entire columns.

It's as simple as calling

sql
Copy
SELECT llm_completion('Summarize the text', 'def hello_world(): print("Hello world")');
SELECT add_completion_job('articles', 'full_text', 'summary', 'Summarize the text');

We support OpenAI LLM models (including GPT-4o, GPT-4o mini, o1-preview, and o1-mini).

We also support any custom LLM deployment. Previously, we showed how to build a code chatbot in 15 minutes with Ubicloud and open-source embedding models and LLM models.

Why LLM completions in SQL?

There's a lot of use cases for LLMs in Postgres. You can use them to fill in data, summarize existing data, or extract structured data from unstructured text, all within your existing database.

Summarize the contents of a column:

sql
Copy
SELECT id, llm_completion('Summarize the following text: ' || full_text) AS summary FROM documents;

Fill in a new column using existing data:

sql
Copy
UPDATE products SET description = llm_completion('Generate a product description for: ' || name) FROM products WHERE description IS NULL;

Extract structured data from an existing, unstructured text column:

sql
Copy
SELECT llm_completion('Extract the key entities and return them as a JSON text array from: ' || notes) AS entities FROM meeting_notes;

The API for adhoc LLM completion is simple.

sql
Copy
SELECT llm_completion(
    user_prompt => 'User input', -- User prompt to LLM model
    model => 'gpt-4o', -- Model for runtime to use (default: 'gpt-4o')
    system_prompt => 'Provide short summary for the given text', -- System prompt for LLM (default: '')
    base_url => 'https://api.openai.com', -- If you have custom LLM deployment provide the server url. (default: OpenAI API URL)
    api_token => '<llm_api_token>', -- API token for LLM server. (default: inferred from lantern_extras.llm_token GUC)
    azure_entra_token => '', -- If this is Azure deployment it supports Auth with entra token too
    runtime => 'openai' -- Runtime to use. (default: 'openai'). Use `SELECT get_available_runtimes()` for list
);

Bulk LLM completions with add_completion_job

While llm_completion is perfect for adhoc LLM calls or processing a few rows, handling LLM calls over large tables benefits from a more efficient approach. LLMs can be time-consuming, taking seconds per call. Running them sequentially over thousands of rows is impractical—it could take hours for even 1,000 rows.

For these cases, you can use add_completion_job to parallelize your LLM calls over table rows. OpenAI supports up to 30,000 calls per minute for its LLM models, allowing you to process large volumes of data quickly in parallel.

sql
Copy
SELECT add_completion_job(
    table_name => 'articles', -- Name of the table
    src_column => 'content', -- Source column for embeddings
    dst_column => 'content_summary', -- Destination column for llm response (will be created automatically)
    system_prompt => 'Provide short summary for the given text', -- System prompt for LLM (default: '')
    column_type => 'TEXT', -- Destination column type
    model => 'gpt-4o', -- Model for runtime to use (default: 'gpt-4o')
    pk => 'id', -- Primary key of the table. It is required for table to have primary key (default: id)
    schema => 'public', -- Schema on which the table is located (default: 'public')
    base_url => 'https://api.openai.com', -- If you have custom LLM deployment provide the server url. (default: OpenAi API URL)
    batch_size => 10, -- Batch size for the inputs to use when requesting LLM server. This is based on your API tier. (default: determined based on model and runtime)
    api_token => '<llm_api_token>', -- API token for LLM server. (default: inferred from lantern_extras.llm_token GUC)
    azure_entra_token => '', -- If this is Azure deployment it supports Auth with entra token too
    runtime => 'openai' -- Runtime to use. (default: 'openai'). Use `SELECT get_available_runtimes()` for list
);

For example, to classify customer feedback in bulk:

sql
Copy
SELECT add_completion_job('user_feedback_responses', 'user_response', 'sentiment_score', 'You will receive user feedback. Return 1 if the response is positive, -1 if negative, or 0 if neutral.');

This schedules a background job to process each row without blocking your main operations.

Keep track of your completion jobs with:

sql
Copy
SELECT * FROM get_completion_jobs();

Embedding Generation

LLM completions pairs nicely with embedding generation for retrieval augmented generation (RAG) use cases. Embeddings convert text into vectors, which enable efficient similarity search over unstructured data. The embedding generation API in Lantern is analogous to the API for LLM completions.

For one off embedding generation, use llm_embedding:

sql
Copy
SELECT llm_embedding(
    input => 'User input', -- User prompt to LLM model
    model => 'gpt-4o', -- Model for runtime to use (default: 'gpt-4o')
    base_url => 'https://api.openai.com', -- If you have custom LLM deployment provide the server url. (default: OpenAi API URL)
    api_token => '<llm_api_token>', -- API token for LLM server. (default: inferred from lantern_extras.llm_token GUC)
    azure_entra_token => '', -- If this is Azure deployment it supports Auth with entra token too
    dimensions => 1536, -- For new generation OpenAi models you can provide dimensions for returned embeddings. (default: 1536)
    input_type => 'search_query', -- Needed only for cohere runtime to indicate if this input is for search or storing. (default: 'search_query'). Can also be 'search_document'
    runtime => 'openai' -- Runtime to use. (default: 'openai'). Use `SELECT get_available_runtimes()` for list
);

For bulk embedding generation, use add_embedding_job . This ensures the embeddings are generated in parallel.

sql
Copy
SELECT add_embedding_job(
    table => 'articles', -- Name of the table
    src_column => 'content', -- Source column for embeddings
    dst_column => 'content_embedding', -- Destination column for embeddings (will be created automatically)
    model => 'text-embedding-3-small', -- Model for runtime to use (default: 'text-embedding-3-small')
    pk => 'id', -- Primary key of the table. It is required for table to have primary key (default: id)
    schema => 'public', -- Schema on which the table is located (default: 'public')
    base_url => 'https://api.openai.com', -- If you have custom LLM deployment provide the server url. (default: OpenAi API URL)
    batch_size => 500, -- Batch size for the inputs to use when requesting LLM server. This is based on your API tier. (default: determined based on model and runtime)
    dimensions => 1536, -- For new generation OpenAi models you can provide dimensions for returned embeddings. (default: 1536)
    api_token => '<llm_api_token>', -- API token for LLM server. (default: inferred from lantern_extras.llm_token GUC)
    azure_entra_token => '', -- If this is Azure deployment it supports Auth with entra token too
    runtime => 'openai' -- Runtime to use. (default: 'openai'). Use `SELECT get_available_runtimes()` for list
);

To monitor your embedding jobs, you can query

sql
Copy
SELECT * FROM get_embedding_jobs();

Pair LLM completion with embedding generation for RAG applications

Combining embedding generation and LLM completions in Postgres allows you to perform the entire RAG workflow directly within Postgres. For example, to find the most relevant user bios and generate a coherent answer:

sql
Copy
WITH contexts AS (
    SELECT
        name,
        experience
    FROM
        company_employees
    ORDER BY
        vector <-> llm_embedding('openai/text-embedding-3-small', 'LLMs')
    LIMIT 10
)
SELECT
    llm_completion(
        STRING_AGG(name || ' (' || experience || ')', ', '),
    'Who should I go to for advice on LLMs? Here are some options.'
  )
FROM contexts

This query retrieves the 10 most relevant users and uses the LLM to generate a consolidated answer.

Try it out for free in Lantern Cloud

LLM completions and embedding generation are both available in Lantern Cloud. Getting started is easy.

  • Create a free account on Lantern Cloud.

  • Connect to your database and enable LLM completions:

    sql
    Copy
    ALTER SYSTEM SET lantern_extras.enable_daemon=true;
    SELECT pg_reload_conf();
    ALTER DATABASE postgres SET lantern_extras.llm_token='YOUR OPENAI API KEY';
  • Start using LLM and embedding functions

We'd love to hear what you think! Feel free to send any feedback to support@lantern.dev.

Share this post