Building a RAG Application with LangChain and OpenAI: Step-by-Step Tutorial

Retrieval-Augmented Generation (RAG) combines the power of large language models with your own data. This tutorial shows you how to build a production-ready RAG application.

What is RAG?

RAG enhances LLM responses by retrieving relevant context from a knowledge base before generating answers. This reduces hallucinations and provides up-to-date information.

Architecture Overview

  • Document Loading: Ingest PDFs, web pages, databases
  • Chunking: Split documents into manageable pieces
  • Embedding: Convert chunks to vector representations
  • Vector Store: Store and search embeddings efficiently
  • Retrieval: Find relevant chunks for a query
  • Generation: Use LLM with retrieved context

Setup

pip install langchain langchain-openai chromadb tiktoken

Step 1: Load Documents

from langchain_community.document_loaders import (
    PyPDFLoader, WebBaseLoader, DirectoryLoader
)

# Load PDFs
loader = PyPDFLoader("documentation.pdf")
docs = loader.load()

# Load web pages
web_loader = WebBaseLoader(["https://docs.example.com/guide"])
web_docs = web_loader.load()

Step 2: Chunk Documents

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " "]
)
chunks = splitter.split_documents(docs)

Step 3: Create Vector Store

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

Step 4: Build the RAG Chain

from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

template = """Use the following context to answer the question.
If you do not know the answer, say so. Do not make up information.

Context: {context}
Question: {question}
Answer:"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])

llm = ChatOpenAI(model="gpt-4o", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt},
    return_source_documents=True
)

Step 5: Query

result = qa_chain.invoke({"query": "How do I configure authentication?"})
print(result["result"])
for doc in result["source_documents"]:
    print(f"Source: {doc.metadata["source"]}, Page: {doc.metadata.get("page", "N/A")}")

Production Tips

  • Use hybrid search (vector + keyword) for better retrieval
  • Implement re-ranking with Cohere or cross-encoders
  • Cache frequent queries to reduce API costs
  • Monitor retrieval quality with evaluation frameworks
  • Use metadata filtering for scoped searches

Conclusion

RAG is the most practical way to build AI applications with custom knowledge. Start with this architecture and iterate based on your specific needs.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Privacy Policy · Contact · Sitemap

© 7Tech – Programming and Tech Tutorials