A code chatbot in 15 minutes with Postgres

Learn how to build a powerful, scalable code chatbot using Lantern on Ubicloud in under 15 minutes. All open-source tooling, from embedding models to cloud providers.

October 24, 2024 · 7 min read

Di Qi

Di Qi

Cofounder

At PGConf EU, we gave a talk with Ubicloud on how to build a code chatbot using Lantern on Ubicloud in under 15 minutes.

We used it to answer questions about the Ubicloud codebase. We were able to get more accurate answers than what ChatGPT-4o returned, using only open-source tools.

In this post, we will share more about how we built it, and how you can benefit from Lantern and Ubicloud.

What is Lantern?

Lantern is building Postgres for AI applications.

There are many components to AI applications, including embedding generation, vector / text search, and LLM completion.

With Lantern, you can simplify your AI tech stack while improving performance. We bring these components inside Postgres so that you don't need to set up separate infrastructure for this. We help you scale by offering serverless indexing, and we handle parallelizing calls and API rate-limiting on your behalf.

What is Ubicloud?

Ubicloud is building the open-source alternative to AWS.

Ubicloud provides cloud services such as virtual machines, block storage, and managed Postgres at rates 3-10x lower than big cloud providers like AWS, GCP, and Azure. The founders previously built distributed Postgres at Citus, which is now a core part of Azure's Postgres offering.

Today, we're excited to launch an offering of Lantern on Ubicloud! You will be able to create a managed Lantern offering in Ubicloud, starting at $97.50 for 2vCPU 8GB 128GB storage machine in Germany vs. $169 / month on AWS Germany.

To get started, visit Ubicloud's documentation, and provision a Lantern Postgres database directly from the Ubicloud console.

The limitations of ChatGPT

LLMs like ChatGPT can often provide great answers about code. However, they have some notable limitations.

  • LLMs don't come with access to live or private knowledge.
  • LLMs have a lesser understanding of public content that isn't widely discussed
  • Related to the above - LLMs often hallucinate, especially for topics they know less about.

That's why it's helpful to build an agent that has access to the live data. This is where retrieval augmented generation, or RAG, comes into play.

With RAG, you index information that you want the LLM to have access to, and pass the relevant context during conversation to help it engage.

An illustration of these problems - we asked ChatGPT-4o:

Example of OpenAI Hallucinating on New Codebase

Building a demo over codebases

At PGConf EU, we showed how to use Lantern on Ubicloud to build a RAG-powered code chatbot in under 15 minutes. We demonstrated it on the ubicloud codebase. We used only open-source tooling hosted on Ubicloud:

  • Postgres extensions from Lantern
  • e5-mistral-7b-instruct embedding model from Mistral
  • llama-3-2-3b-it LLM model from Meta

We'll give a high level overview below. The code for the demo is available here.

A demo that pulls in folder summaries and commit summaries and compares the answers of LLMs with and without context, and context generated with Llama vs. OpenAI, is available here.

How to process the codebase

We ingested the Ubicloud codebase, which contains about 600 files, inserting the file name and code into a files table.

sql
Copy
CREATE TABLE files (id serial primary key, name text, code text);

We generated summaries of the files, and embeddings over these summaries. The summaries condense the code and makes it easier to retrieve and use context. The embeddings over the summaries to enable efficient search.

To generated the summaries and embeddings, we used Lantern's managed LLM column and a managed embedding columns. This was as simple as calling add_completion_job and add_embedding_job functionality.

Generating LLM summaries and embeddings for 600 files of code can take a long time - up to 10 hours sequentially!

But the managed jobs ensure that these calls are parallelized, while following rate limits and handling errors.

sql
Copy
SELECT add_completion_job('files', 'code', 'description', 'Summarize this code', 'llama-3-2-3b-it', 'ubicloud');
SELECT add_embedding_job('files', 'description', 'vector', 'e5-mistral-7b-it', 'ubicloud');

How to query the codebase

Now that our data is indexed, whenever we wanted to ask a question, we can simply use SQL to find the most relevant files, and retrieve the file descriptions, using them as context.

sql
Copy
WITH contexts AS (
    SELECT
        ('File: ' || name || '\n'
            || 'Description: ' || description) AS context
    FROM
        files
    ORDER BY
        vector <-> text_embedding('How can I do X?', 'e5-mistral-7b-it', 'ubicloud')
    LIMIT 10
)
SELECT
    llm_completion(
        'Question: Which embedding models does Ubicloud support?'
        || '\nContext:' STRING_AGG(context),
        'llama-3-2-3b-it',
        'ubicloud'
    )
FROM
    contexts

We tried it out with the question from earlier.

Screenshot of App Demonstrating Improved Results with RAG using Lantern + Ubicloud

Indexing for scale

Ubicloud only has 600 files, so vector queries for file summaries are very fast!

However, you can imagine that as you improve this chatbot, this could number become more quite large. For example, you can store finer granularity information such as individual functions, or non-code information like documentation and commits, or related information like other codebases that Ubicloud imports.

Once you pass about 10k vectors, it is helpful to create an HSNW index to make your vector queries faster.

sql
Copy
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);

For larger datasets, parallel indexing, which comes out of the box with pgvector, can speed up indexing.

However, this can slow down your database performance, and you may want to temporarily add compute resources during the index creation process. For this, we've implemented serverless indexing.

sql
Copy
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WITH (external=True)

With serverless indexing, even with millions of vectors, you can create a vector index without affecting your database performance by using external resources. You can have a 1 CPU database, temporarily spin up 100 CPUs to create an index faster in parallel, and spin them down when done, importing the index structure back into Postgres.

Scaling cost-effectively with NVME

As you scale to larger and larger datasets, your database can also become expensive.

Ubicloud costs 3-10x less than other clouds for equivalent services. But on top of this, their Postgres instances run on NVME machines. And what this means is that you can even get away with choosing a smaller, less expensive database and still get good performance.

If your database is large enough for all of the data to fit in memory, then query times are quite fast. But if your data doesn't fit in memory, then it will go to disk. Disk is cheap, but slow. As a result of this tradeoff, generally people opt to get a large instance so the data fits in memory.

However, with NVME, retrieving from disk is much faster. As a result, you can get away with some of your data staying on disk, and get a smaller machine. Getting a smaller machine saves money. NVME machines are not the default on the major cloud providers, but they are for Ubicloud.

See the chart below for some benchmarks.

Benchmarks show how NVME on Lantern + Ubicloud improves price-performance relative to GCP

All in all, Ubicloud does well on a price-basis, and even better on a price-performance basis compared to major Cloud providers like AWS, Azure, and GCP.

Conclusion

In sum: we are able to achieve all of this using only Postgres on Ubicloud. There's no need for a separate vector database or search engine to support RAG. There's no need to build separate infrastructure to handle efficient API calls. And we did all of this while using only open-source tools.

Just use Postgres.

Resources

Share this post