The rapid evolution of generative AI models like OpenAI’s ChatGPT has revolutionized natural language processing, enabling these systems to generate coherent and contextually relevant responses. However, even state-of-the-art models face limitations when tackling domain-specific queries or providing highly accurate information. This often leads to challenges like hallucinations — instances where models produce inaccurate or fabricated details.
Retrieval-Augmented Generation (RAG), an innovative framework designed to bridge this gap. By seamlessly integrating external data sources, RAG empowers generative models to retrieve real-time, niche information, significantly enhancing their accuracy and reliability.
In this article, we will dive into the mechanics of RAG, explore its architecture, and discuss the limitations of traditional generative models that inspired its creation. We will also highlight practical implementations, advanced techniques, and evaluation methods, showcasing how RAG is transforming the way AI interacts with specialized data.
Retrieval-Augmented Generation (RAG) is an advanced framework that enhances the capabilities of generative AI models by integrating real-time retrieval of external data. While generative models excel at producing coherent, human-like text, they can falter when asked to provide accurate, up-to-date, or domain-specific information. This is where RAG steps in, ensuring that the responses are not only creative but also grounded in reliable and relevant sources.
RAG operates by connecting a generative model with a retrieval mechanism, typically powered by vector databases or search systems. When a query is received, the retrieval component searches through vast external datasets to fetch relevant information. The generative model then synthesizes this data, producing an output that is both accurate and contextually insightful.
By addressing key challenges like hallucinations and limited domain knowledge, RAG unlocks the potential of generative models to excel in specialized fields. Its applications span diverse industries, from automating customer support with precise answers, enabling researchers to access curated knowledge on demand. RAG represents a significant step forward in making AI systems more intelligent, trustworthy, and useful in real-world scenarios.
A clear understanding of RAG architecture is essential for unlocking its full potential and benefits. At its core, the framework is built on two primary components: the Retriever and the Generator, working together in a seamless flow of information processing.
This overall process is illustrated below:
source: https://weaviate.io/blog/introduction-to-rag
All the stages and essential components of the RAG process flow, illustrated in the figure below.
source: https://www.griddynamics.com/blog/retrieval-augmented-generation-llm
Dividing documents into smaller chunks may seem simple, but it requires careful consideration of semantics to avoid splitting sentences inappropriately, which can affect subsequent steps like question answering. A naive fixed-size chunking approach can result in incomplete information in each chunk. Most document segmentation algorithms use chunk size and overlap, where chunk size is determined by character, word, or token count, and overlaps ensure continuity by sharing text between adjacent chunks. This strategy preserves the semantic context across chunks.
source: https://www.griddynamics.com/blog/retrieval-augmented-generation-llm
Some of the important vector databases are:
source: https://www.griddynamics.com/blog/retrieval-augmented-generation-llm
RAG (Retrieval-Augmented Generation) and fine-tuning are two key methods to extend LLM capabilities, each suited to different scenarios. Fine-tuning involves retraining LLMs on domain-specific data to perform specialized tasks, ideal for static, narrow use cases like branding or creative writing that require a specific tone or style. However, it is costly, time-consuming, and unsuitable for dynamic, frequently updated data.
On the other hand, RAG enhances LLMs by retrieving external data dynamically without modifying model weights, making it cost-effective and ideal for real-time, data-driven environments like legal, financial, or customer service applications. RAG enables LLMs to handle large, unstructured internal document corpora, offering significant advantages over traditional methods for navigating messy data repositories.
Fine-tuning excels at creating nuanced, consistent outputs whereas RAG provides up-to-date, accurate information by leveraging external knowledge bases. In practice, RAG is often the preferred choice for applications requiring real-time, adaptable responses, especially in enterprises managing vast, unstructured data.
There are several types of Retrieval-Augmented Generation (RAG) approaches, each tailored to specific use cases and objectives. The primary types include:
source: https://x.com/weaviate_io/status/1866528335884325070
The Retrieval-Augmented Generation (RAG) framework has diverse applications across various industries due to its ability to dynamically integrate external knowledge into generative language models. Here are some prominent applications:
In this section, we will develop a streamlit application capable of understanding the contents of a PDF and responding to user queries based on that content using the Retrieval-Augmented Generation (RAG). The implementation leverages the LangChain platform to facilitate interactions with LLMs and vector stores. We will utilize OpenAI’s LLM and its embedding models to construct a FAISS vector store for efficient information retrieval.
python -m venv venv source venv/bin/activate #for ubuntu venv/Scripts/activate #for windows
pip install langchain langchain_community openai faiss-cpu PyPDF2 streamlit python-dotenv tiktoken
OPENAI_API_KEY=sk-proj-xcQxBf5LslO62At... OPENAI_MODEL_NAME=gpt-3.5-turbo OPENAI_EMBEDDING_MODEL_NAME=text-embedding-3-small
from dotenv import load_dotenv import os load_dotenv() OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME") OPENAI_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_EMBEDDING_MODEL_NAME")
Import essential libraries for building the app, handling PDFs such as langchain, streamlit, pyPDF.
import streamlit as st from PyPDF2 import PdfReader from langchain.text_splitter import CharacterTextSplitter from langchain.prompts import PromptTemplate from langchain_community.embeddings import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain.memory import ConversationBufferMemory from langchain.chains import ConversationalRetrievalChain from langchain_community.chat_models import ChatOpenAI from htmlTemplates import bot_template, user_template, css
def get_pdf_text(pdf_files): text = "" for pdf_file in pdf_files: reader = PdfReader(pdf_file) for page in reader.pages: text += page.extract_text() return text
Divide large text into smaller, manageable chunks using LangChain’s CharacterTextSplitter.
def get_chunk_text(text): text_splitter = CharacterTextSplitter( separator="\n", chunk_size=1000, chunk_overlap=200, length_function=len ) chunks = text_splitter.split_text(text) return chunks
Generate embeddings for text chunks and store them in a vector database using FAISS.
def get_vector_store(text_chunks): embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model=OPENAI_EMBEDDING_MODEL_NAME) vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings) return vectorstore
Define a chain that retrieves information from the vector store and interacts with the user via an LLM.
def get_conversation_chain(vector_store): llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name=OPENAI_MODEL_NAME, temperature=0) memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True) system_template = """ Use the following pieces of context and chat history to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Chat history: {chat_history} Question: {question} Helpful Answer: """ prompt = PromptTemplate( template=system_template, input_variables=["context", "question", "chat_history"], ) conversation_chain = ConversationalRetrievalChain.from_llm( verbose = True, llm=llm, retriever=vector_store.as_retriever(), memory=memory, combine_docs_chain_kwargs={"prompt": prompt} ) return conversation_chain
Process user input, pass it to the conversation chain, and update the chat history.
def handle_user_input(question): try: response = st.session_state.conversation({'question': question}) st.session_state.chat_history = response['chat_history'] except Exception as e: st.error('Please select PDF and click on Process.')
To create a custom chat interface for both user and bot messages using CSS, design custom templates and style them with CSS.
css = ''' <style> .chat-message { padding: 1rem; border-radius: 0.5rem; margin-bottom: 1rem; display: flex } .chat-message.user { background-color: #2b313e } .chat-message.bot { background-color: #475063 } .chat-message .avatar { width: 10%; } .chat-message .avatar img { max-width: 30px; max-height: 30px; border-radius: 50%; object-fit: cover; } .chat-message .message { width: 90%; padding: 0 1rem; color: #fff; } ''' bot_template = ''' <div> <h3> Displaying chat history </h3> <p>Show the user and AI conversation history in a reverse order with HTML templates for formatting.<br> </p> <pre class="brush:php;toolbar:false">def display_chat_history(): if st.session_state.chat_history: reversed_history = st.session_state.chat_history[::-1] formatted_history = [] for i in range(0, len(reversed_history), 2): chat_pair = { "AIMessage": reversed_history[i].content, "HumanMessage": reversed_history[i + 1].content } formatted_history.append(chat_pair) for i, message in enumerate(formatted_history): st.write(user_template.replace("{{MSG}}", message['HumanMessage']), unsafe_allow_html=True) st.write(bot_template.replace("{{MSG}}", message['AIMessage']), unsafe_allow_html=True)
Set up the main app interface for file uploads, question input, and chat history display.
def main(): st.set_page_config(page_title='Chat with PDFs', page_icon=':books:') st.write(css, unsafe_allow_html=True) if "conversation" not in st.session_state: st.session_state.conversation = None if "chat_history" not in st.session_state: st.session_state.chat_history = None st.header('Chat with PDFs :books:') question = st.text_input("Ask anything to your PDF:") if question: handle_user_input(question) if st.session_state.chat_history is not None: display_chat_history() with st.sidebar: st.subheader("Upload your Documents Here: ") pdf_files = st.file_uploader("Choose your PDF Files and Press Process button", type=['pdf'], accept_multiple_files=True) if pdf_files and st.button("Process"): with st.spinner("Processing your PDFs..."): try: # Get PDF Text raw_text = get_pdf_text(pdf_files) # Get Text Chunks text_chunks = get_chunk_text(raw_text) # Create Vector Store vector_store = get_vector_store(text_chunks) st.success("Your PDFs have been processed successfully. You can ask questions now.") # Create conversation chain st.session_state.conversation = get_conversation_chain(vector_store) except Exception as e: st.error(f"An error occurred: {e}") if __name__ == '__main__': main()
The following is the complete code implementation for the PDF Chat Application. It integrates environment variable setup, text extraction, vector storage, and RAG features into a streamlined solution:
from dotenv import load_dotenv import os load_dotenv() OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME") OPENAI_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_EMBEDDING_MODEL_NAME") import streamlit as st from PyPDF2 import PdfReader from langchain.text_splitter import CharacterTextSplitter from langchain.prompts import PromptTemplate from langchain_community.embeddings import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain.memory import ConversationBufferMemory from langchain.chains import ConversationalRetrievalChain from langchain_community.chat_models import ChatOpenAI from htmlTemplates import bot_template, user_template, css def get_pdf_text(pdf_files): text = "" for pdf_file in pdf_files: reader = PdfReader(pdf_file) for page in reader.pages: text += page.extract_text() return text def get_chunk_text(text): text_splitter = CharacterTextSplitter( separator="\n", chunk_size=1000, chunk_overlap=200, length_function=len ) chunks = text_splitter.split_text(text) return chunks def get_vector_store(text_chunks): embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model=OPENAI_EMBEDDING_MODEL_NAME) vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings) return vectorstore def get_conversation_chain(vector_store): llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name=OPENAI_MODEL_NAME, temperature=0) memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True) system_template = """ Use the following pieces of context and chat history to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Chat history: {chat_history} Question: {question} Helpful Answer: """ prompt = PromptTemplate( template=system_template, input_variables=["context", "question", "chat_history"], ) conversation_chain = ConversationalRetrievalChain.from_llm( verbose = True, llm=llm, retriever=vector_store.as_retriever(), memory=memory, combine_docs_chain_kwargs={"prompt": prompt} ) return conversation_chain def handle_user_input(question): try: response = st.session_state.conversation({'question': question}) st.session_state.chat_history = response['chat_history'] except Exception as e: st.error('Please select PDF and click on OK.') def display_chat_history(): if st.session_state.chat_history: reversed_history = st.session_state.chat_history[::-1] formatted_history = [] for i in range(0, len(reversed_history), 2): chat_pair = { "AIMessage": reversed_history[i].content, "HumanMessage": reversed_history[i + 1].content } formatted_history.append(chat_pair) for i, message in enumerate(formatted_history): st.write(user_template.replace("{{MSG}}", message['HumanMessage']), unsafe_allow_html=True) st.write(bot_template.replace("{{MSG}}", message['AIMessage']), unsafe_allow_html=True) def main(): st.set_page_config(page_title='Chat with PDFs', page_icon=':books:') st.write(css, unsafe_allow_html=True) if "conversation" not in st.session_state: st.session_state.conversation = None if "chat_history" not in st.session_state: st.session_state.chat_history = None st.header('Chat with PDFs :books:') question = st.text_input("Ask anything to your PDF:") if question: handle_user_input(question) if st.session_state.chat_history is not None: display_chat_history() with st.sidebar: st.subheader("Upload your Documents Here: ") pdf_files = st.file_uploader("Choose your PDF Files and Press Process button", type=['pdf'], accept_multiple_files=True) if pdf_files and st.button("Process"): with st.spinner("Processing your PDFs..."): try: # Get PDF Text raw_text = get_pdf_text(pdf_files) # Get Text Chunks text_chunks = get_chunk_text(raw_text) # Create Vector Store vector_store = get_vector_store(text_chunks) st.success("Your PDFs have been processed successfully. You can ask questions now.") # Create conversation chain st.session_state.conversation = get_conversation_chain(vector_store) except Exception as e: st.error(f"An error occurred: {e}") if __name__ == '__main__': main()
Execute the app with Streamlit using the following command.
streamlit run app.py
You will get output as follows,
Thanks for reading this article !!
Thanks Gowri M Bhatt for reviewing the content.
If you enjoyed this article, please click on the heart button ♥ and share to help others find it!
The full source code for this tutorial can be found here,
codemaker2015/pdf-chat-using-RAG | github.com
The above is the detailed content of The ultimate guide to Retrieval-Augmented Generation (RAG). For more information, please follow other related articles on the PHP Chinese website!