Retrieval-Augmented Generation (RAG) combines the power of large language models with your own data. This tutorial shows you how to build a production-ready RAG application.
What is RAG?
RAG enhances LLM responses by retrieving relevant context from a knowledge base before generating answers. This reduces hallucinations and provides up-to-date information.
Architecture Overview
- Document Loading: Ingest PDFs, web pages, databases
- Chunking: Split documents into manageable pieces
- Embedding: Convert chunks to vector representations
- Vector Store: Store and search embeddings efficiently
- Retrieval: Find relevant chunks for a query
- Generation: Use LLM with retrieved context
Setup
pip install langchain langchain-openai chromadb tiktokenStep 1: Load Documents
from langchain_community.document_loaders import (
PyPDFLoader, WebBaseLoader, DirectoryLoader
)
# Load PDFs
loader = PyPDFLoader("documentation.pdf")
docs = loader.load()
# Load web pages
web_loader = WebBaseLoader(["https://docs.example.com/guide"])
web_docs = web_loader.load()Step 2: Chunk Documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " "]
)
chunks = splitter.split_documents(docs)Step 3: Create Vector Store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)Step 4: Build the RAG Chain
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
template = """Use the following context to answer the question.
If you do not know the answer, say so. Do not make up information.
Context: {context}
Question: {question}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])
llm = ChatOpenAI(model="gpt-4o", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
chain_type_kwargs={"prompt": prompt},
return_source_documents=True
)Step 5: Query
result = qa_chain.invoke({"query": "How do I configure authentication?"})
print(result["result"])
for doc in result["source_documents"]:
print(f"Source: {doc.metadata["source"]}, Page: {doc.metadata.get("page", "N/A")}")Production Tips
- Use hybrid search (vector + keyword) for better retrieval
- Implement re-ranking with Cohere or cross-encoders
- Cache frequent queries to reduce API costs
- Monitor retrieval quality with evaluation frameworks
- Use metadata filtering for scoped searches
Conclusion
RAG is the most practical way to build AI applications with custom knowledge. Start with this architecture and iterate based on your specific needs.

Leave a Reply