Creating a RAG Agent with LangGraph

In our first tutorial, we built a ReAct agent that could answer our questions utilising a web search tool. But what if we have some specific information we would like to give the agent. Say a customer service application where there are set answers and procedures that should be returned. We might not always want this information in the context / system prompt, so are agent must be able to retrieve this information when required.

This is where we turn to Retrieval Augmented Generation (RAG). For this we need to set up a database as well as a retriever to fetch relevant documents from it.

Now at this point, we really enter the minefield of options and packages to use. The received wisdom is to use a vectorstore for the database so that the documents can be found via semantic search - a technique that searches for similarity of database content to the query in vector space. This saves compute for larger applications, but might seem like overkill for the below example. Nevertheless, it is a good place to start learning. Pinecone give a brief introduction to the rise of vector data here

Vector Data

The following gives an overview of the vector store options - feel free to skip down to the tutorial below.

There are three key things to consider for a handling vector data:

Database - where you store the vectors
Embedding - how you make the vectors (from text etc.)
Retrieval - how you return the vectors (back to text etc.)

Database

For full scale applications a persistent vector database will be necessary. Pinecone is a popular choice, but open source options like Chroma or Qdrant may be preferable.

For simple use cases, the embedding can be stored in the application memory

Embedding

There are countless embedding options, the merits of which one could study for a lifetime. Stick with a common, fast option. Try to stick with one embedding for a project. One should be careful to segment any data by different embeddings if there are used.

Technically there are two parts, a library and a model:

The industry standard is OpenAI’s text-embedding-3-small/large, accessed through their library.

For a lightweight offline option.sentence transformers, using an open source model such as all-MiniLM-L6-v2

Qdrant’s fastembed provides another lightweight option with access to more powerful open source models. We will use BAAI/bge-base-en-v1.5 from the Beijing Academy of AI.