The ultimate guide to Retrieval-Augmented Generation (RAG)-Python Tutorial-php.cn

The rapid evolution of generative AI models like OpenAI’s ChatGPT has revolutionized natural language processing, enabling these systems to generate coherent and contextually relevant responses. However, even state-of-the-art models face limitations when tackling domain-specific queries or providing highly accurate information. This often leads to challenges like hallucinations — instances where models produce inaccurate or fabricated details.

Retrieval-Augmented Generation (RAG), an innovative framework designed to bridge this gap. By seamlessly integrating external data sources, RAG empowers generative models to retrieve real-time, niche information, significantly enhancing their accuracy and reliability.

In this article, we will dive into the mechanics of RAG, explore its architecture, and discuss the limitations of traditional generative models that inspired its creation. We will also highlight practical implementations, advanced techniques, and evaluation methods, showcasing how RAG is transforming the way AI interacts with specialized data.

Getting Started

What is RAG
Architecture of RAG
RAG Process flow
RAG vs Fine tuning
Types of RAG
Applications of RAG
Building a PDF chat system with RAG
Resources

What is RAG

Retrieval-Augmented Generation (RAG) is an advanced framework that enhances the capabilities of generative AI models by integrating real-time retrieval of external data. While generative models excel at producing coherent, human-like text, they can falter when asked to provide accurate, up-to-date, or domain-specific information. This is where RAG steps in, ensuring that the responses are not only creative but also grounded in reliable and relevant sources.

RAG operates by connecting a generative model with a retrieval mechanism, typically powered by vector databases or search systems. When a query is received, the retrieval component searches through vast external datasets to fetch relevant information. The generative model then synthesizes this data, producing an output that is both accurate and contextually insightful.

By addressing key challenges like hallucinations and limited domain knowledge, RAG unlocks the potential of generative models to excel in specialized fields. Its applications span diverse industries, from automating customer support with precise answers, enabling researchers to access curated knowledge on demand. RAG represents a significant step forward in making AI systems more intelligent, trustworthy, and useful in real-world scenarios.

Architecture of RAG

A clear understanding of RAG architecture is essential for unlocking its full potential and benefits. At its core, the framework is built on two primary components: the Retriever and the Generator, working together in a seamless flow of information processing.

This overall process is illustrated below:
The ultimate guide to Retrieval-Augmented Generation (RAG)
source: https://weaviate.io/blog/introduction-to-rag

Retrieval — The inference stage in RAG begins with retrieval, where data relevant to a user query is fetched from an external knowledge source. In a basic RAG setup, similarity search is commonly used, embedding the query and external data into the same vector space to identify the closest matches. The Retriever plays a key role in fetching documents, employing methods like Sparse Retrieval and Dense Retrieval. Sparse Retrieval, using techniques like TF-IDF or BM25, relies on exact word matches but struggles with synonyms and paraphrasing whereas Dense Retrieval leverages transformer models like BERT or RoBERTa to create semantic vector representations, enabling more accurate and nuanced matches.
Augmentation — After retrieving the most relevant data points from the external source, the augmentation process incorporates this information by embedding it into a predefined prompt template.
Generation — In the generation phase, the model uses the augmented prompt to craft a coherent, contextually accurate response by combining its internal language understanding with the retrieved external data. While augmentation integrates external facts, generation transforms this enriched information into natural, human-like text tailored to the user’s query.

RAG Process flow

All the stages and essential components of the RAG process flow, illustrated in the figure below.

The ultimate guide to Retrieval-Augmented Generation (RAG)
source: https://www.griddynamics.com/blog/retrieval-augmented-generation-llm

Document Loading: The first step in the RAG process involves data preparation, which includes loading documents from storage, extracting, parsing, cleaning, and formatting text for document splitting. Text data can come in various formats, such as plain text, PDFs, Word documents, CSV, JSON, HTML, Markdown, or programming code. Preparing these diverse sources for LLMs typically requires converting them to plain text through extraction, parsing, and cleaning.
Document Splitting: Documents are divided into smaller, manageable segments through text splitting or chunking, which is essential for handling large documents and adhering to token limits in LLMs (e.g., GPT-3’s 2048 tokens). Strategies include fixed-size or content-aware chunking, with the approach depending on the structure and requirements of the data.

Dividing documents into smaller chunks may seem simple, but it requires careful consideration of semantics to avoid splitting sentences inappropriately, which can affect subsequent steps like question answering. A naive fixed-size chunking approach can result in incomplete information in each chunk. Most document segmentation algorithms use chunk size and overlap, where chunk size is determined by character, word, or token count, and overlaps ensure continuity by sharing text between adjacent chunks. This strategy preserves the semantic context across chunks.

Text Embedding: The text chunks are transformed into vector embeddings, which represent each chunk in a way that allows for easy comparison of semantic similarity. Vector embeddings map complex data, like text, into a mathematical space where similar data points cluster together. This process captures the semantic meaning of the text, so sentences with similar meaning, even if worded differently, are mapped close together in the vector space. For instance, “The cat chases the mouse” and “The feline pursues the rodent” would be mapped to nearby points despite their different wording.

The ultimate guide to Retrieval-Augmented Generation (RAG)
source: https://www.griddynamics.com/blog/retrieval-augmented-generation-llm

Vector store: After documents are segmented and converted into vector embeddings, they are stored in a vector store, a specialized database for storing and managing vectors. A vector store enables efficient searches for similar vectors, which is crucial for the execution of a RAG model. The selection of a vector store depends on factors like data scale and available computational resources.

Some of the important vector databases are:

FAISS: Developed by Facebook AI, FAISS efficiently manages large collections of high-dimensional vectors and performs similarity searches and clustering in high-dimensional environments. It optimizes memory usage and query duration, making it suitable for handling billions of vectors.
Chroma: An open-source, in-memory vector database, Chroma is designed for LLM applications, offering a scalable platform for vector storage, search, and retrieval. It supports both cloud and on-premise deployment and is versatile in handling various data types and formats.
Weaviate: An open-source vector database that can be self-hosted or fully managed. It focuses on high performance, scalability, and flexibility, supporting a wide range of data types and applications. It allows for the storage of both vectors and objects, enabling the combination of vector-based and keyword-based search techniques.
Pinecone: A cloud-based, managed vector database designed to simplify the development and deployment of large-scale ML applications. Unlike many vector databases, Pinecone uses proprietary, closed-source code. It excels in handling high-dimensional vectors and is suitable for applications like similarity search, recommendation systems, personalization, and semantic search. Pinecone also features a single-stage filtering capability.
Document retrieval: The retrieval process in information retrieval systems, such as document searching or question answering, begins when a query is received and transformed into a vector using the same embedding model as the document indexing. The goal is to return relevant document chunks by comparing the query vector with stored chunk vectors in the index (vector store). The retriever’s role is to identify and return the IDs of relevant document chunks, without storing documents. Various search methods can be used, such as similarity search (based on cosine similarity) and threshold-based retrieval, which only returns documents exceeding a certain similarity score. Additionally, LLM-aided retrieval is useful for queries involving both content and metadata filtering.

The ultimate guide to Retrieval-Augmented Generation (RAG)
source: https://www.griddynamics.com/blog/retrieval-augmented-generation-llm

Answer generation: In the retrieval process, relevant document chunks are combined with the user query to generate a context and prompt for the LLM. The simplest approach, called the “Stuff” method in LangChain, involves funneling all chunks into the same context window for a direct, straightforward answer. However, this method struggles with large document volumes and complex queries due to context window limitations. To address this, alternative methods like Map-reduce, Refine, and Map-rerank are used. Map-reduce sends documents separately to the LLM, then combines the responses. Refine iteratively updates the prompt to refine the answer, while Map-rerank ranks documents based on relevance, ideal for multiple compelling answers.

The ultimate guide to Retrieval-Augmented Generation (RAG)

RAG vs Fine tuning

RAG (Retrieval-Augmented Generation) and fine-tuning are two key methods to extend LLM capabilities, each suited to different scenarios. Fine-tuning involves retraining LLMs on domain-specific data to perform specialized tasks, ideal for static, narrow use cases like branding or creative writing that require a specific tone or style. However, it is costly, time-consuming, and unsuitable for dynamic, frequently updated data.

On the other hand, RAG enhances LLMs by retrieving external data dynamically without modifying model weights, making it cost-effective and ideal for real-time, data-driven environments like legal, financial, or customer service applications. RAG enables LLMs to handle large, unstructured internal document corpora, offering significant advantages over traditional methods for navigating messy data repositories.

Fine-tuning excels at creating nuanced, consistent outputs whereas RAG provides up-to-date, accurate information by leveraging external knowledge bases. In practice, RAG is often the preferred choice for applications requiring real-time, adaptable responses, especially in enterprises managing vast, unstructured data.

Types of RAG

There are several types of Retrieval-Augmented Generation (RAG) approaches, each tailored to specific use cases and objectives. The primary types include:

The ultimate guide to Retrieval-Augmented Generation (RAG)
source: https://x.com/weaviate_io/status/1866528335884325070

Native RAG: Refers to a tightly integrated approach where the retrieval and generative components of a Retrieval-Augmented Generation system are designed to work seamlessly within the same architecture. Unlike traditional implementations that rely on external tools or APIs, native RAG optimizes the interaction between retrieval mechanisms and generative models, enabling faster processing and improved contextual relevance. This approach often uses in-memory processing or highly optimized local databases, reducing latency and resource overhead. Native RAG systems are typically tailored for specific use cases, providing enhanced efficiency, accuracy, and cost-effectiveness by eliminating dependencies on third-party services.
Retrieve and Rerank RAG: Focuses on refining the retrieval process to improve accuracy and relevance. In this method, an initial set of documents or chunks is retrieved based on the query’s semantic similarity, usually determined by cosine similarity in the embedding space. Subsequently, a reranking model reorders the retrieved documents based on their contextual relevance to the query. This reranking step often leverages deep learning models or transformers, allowing more nuanced ranking beyond basic similarity metrics. By prioritizing the most relevant documents, this approach ensures the generative model receives contextually enriched input, significantly enhancing response quality.
Multimodal RAG: Extends the traditional RAG paradigm by incorporating multiple data modalities, such as text, images, audio, or video, into the retrieval-augmented generation pipeline. It allows the system to retrieve and generate responses that integrate diverse forms of data. For instance, in a scenario involving image-based queries, the system might retrieve relevant images alongside textual content to create a more comprehensive answer. Multimodal RAG is particularly useful in domains like e-commerce, medical imaging, and multimedia content analysis, where insights often rely on a combination of textual and visual information.
Graph RAG: Leverages graph-based data structures to model and retrieve information based on relationships and connections between entities. In this approach, knowledge is organized as a graph where nodes represent entities (e.g., concepts, documents, or objects), and edges capture their relationships (e.g., semantic, hierarchical, or temporal). Queries are processed to identify subgraphs or paths relevant to the input, and these subgraphs are then fed into the generative model. This method is especially valuable in domains like scientific research, social networks, and knowledge management, where relational insights are critical.
Hybrid RAG: Combines multiple retrieval techniques, such as dense and sparse retrieval, to enhance performance across diverse query types. Dense retrieval uses vector embeddings to capture semantic similarities, while sparse retrieval relies on keyword-based methods, like BM25, for precise matches. By integrating these methods, Hybrid RAG balances precision and recall, making it versatile across scenarios where queries may be highly specific or abstract. It is particularly effective in environments with heterogeneous data, ensuring that both high-level semantics and specific keywords are considered during retrieval.
Agentic RAG (Router): Employs a decision-making layer to dynamically route queries to appropriate retrieval and generative modules based on their characteristics. The router analyzes incoming queries to determine the optimal processing path, which may involve different retrieval methods, data sources, or even specialized generative models. This approach ensures that the system tailors its operations to the specific needs of each query, enhancing efficiency and accuracy in diverse applications.
Agentic RAG (Multi-Agent RAG): Multi-Agent RAG involves a collaborative framework where multiple specialized agents handle distinct aspects of the retrieval and generation process. Each agent is responsible for a specific task, such as retrieving data from a particular domain, reranking results, or generating responses in a specific style. These agents communicate and collaborate to deliver cohesive outputs. Multi-Agent RAG is particularly powerful for complex, multi-domain queries, as it enables the system to leverage the expertise of different agents to provide comprehensive and nuanced responses.

Applications of RAG

The Retrieval-Augmented Generation (RAG) framework has diverse applications across various industries due to its ability to dynamically integrate external knowledge into generative language models. Here are some prominent applications:

Customer Support and Service: RAG systems are widely used in customer support to create intelligent chatbots capable of answering complex queries by retrieving relevant data from product manuals, knowledge bases, and company policy documents. This ensures that customers receive accurate and up-to-date information, enhancing their experience.
Legal Document Analysis: In the legal field, RAG can parse, retrieve, and generate summaries or answers from vast corpora of case law, contracts, and legal documents. It is particularly useful for conducting legal research, drafting contracts, and ensuring compliance with regulations.
Financial Analysis: RAG is employed in financial services to analyze earnings reports, market trends, and regulatory documents. By retrieving relevant financial data, it can help analysts generate insights, reports, or even real-time answers to queries about market performance.
Healthcare and Medical Diagnostics: In healthcare, RAG is utilized to retrieve and synthesize information from medical literature, patient records, and treatment guidelines. It aids in diagnostic support, drug discovery, and personalized treatment recommendations, ensuring clinicians have access to the latest and most relevant data.
Education and E-Learning: RAG-powered tools assist in personalized education by retrieving course material and generating tailored answers or study guides. They can enhance learning platforms by providing contextual explanations and dynamic content based on user queries.
E-Commerce and Retail: In e-commerce, RAG systems improve product search and recommendation engines by retrieving data from catalogs and customer reviews. They also enable conversational shopping assistants that provide personalized product suggestions based on user preferences.
Intelligent Virtual Assistants: RAG enhances virtual assistants like Alexa or Siri by providing accurate and contextually relevant responses, especially for queries requiring external knowledge, such as real-time weather updates or local business information.

Building a PDF chat system using RAG

In this section, we will develop a streamlit application capable of understanding the contents of a PDF and responding to user queries based on that content using the Retrieval-Augmented Generation (RAG). The implementation leverages the LangChain platform to facilitate interactions with LLMs and vector stores. We will utilize OpenAI’s LLM and its embedding models to construct a FAISS vector store for efficient information retrieval.

Installing dependencies

Create and activate a virtual environment by executing the following command.

python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows

Copy after login

Install langchain, langchain_community, openai, faiss-cpu, PyPDF2, streamlit, python-dotenv, tiktoken libraries using pip.

pip install langchain langchain_community openai faiss-cpu PyPDF2 streamlit python-dotenv tiktoken

Copy after login

Setting up environment and credentials

Create a file named .env. This file will store your environment variables, including the OpenAI key, model and embeddings.
Open the .env file and add the following code to specify your OpenAI credentials:

OPENAI_API_KEY=sk-proj-xcQxBf5LslO62At...
OPENAI_MODEL_NAME=gpt-3.5-turbo
OPENAI_EMBEDDING_MODEL_NAME=text-embedding-3-small

Copy after login

Importing environment variables

Create a file named app.py.
Add OpenAI credentials to the environment variables.

from dotenv import load_dotenv
import os
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")
OPENAI_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_EMBEDDING_MODEL_NAME")

Copy after login

Importing required libraries

Import essential libraries for building the app, handling PDFs such as langchain, streamlit, pyPDF.

import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.chat_models import ChatOpenAI
from htmlTemplates import bot_template, user_template, css

Copy after login

Defining a function to extract text from PDFs

Use PyPDF2 to extract text from uploaded PDF files.

def get_pdf_text(pdf_files):
    text = ""
    for pdf_file in pdf_files:
        reader = PdfReader(pdf_file)
        for page in reader.pages:
            text += page.extract_text()
    return text

Copy after login

Splitting extracted text into chunks

Divide large text into smaller, manageable chunks using LangChain’s CharacterTextSplitter.

def get_chunk_text(text):
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    chunks = text_splitter.split_text(text)
    return chunks

Copy after login

Creating a vector store for text embeddings

Generate embeddings for text chunks and store them in a vector database using FAISS.

def get_vector_store(text_chunks):
    embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model=OPENAI_EMBEDDING_MODEL_NAME)
    vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
    return vectorstore

Copy after login

Building a conversational retrieval chain

Define a chain that retrieves information from the vector store and interacts with the user via an LLM.

def get_conversation_chain(vector_store):
    llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name=OPENAI_MODEL_NAME, temperature=0)
    memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
    system_template  =  """
    Use  the following pieces of context and chat history to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.

    Context: {context}

    Chat history: {chat_history}

    Question: {question}
    Helpful Answer:
    """
    prompt = PromptTemplate(
        template=system_template,
        input_variables=["context", "question",  "chat_history"],
    )
    conversation_chain = ConversationalRetrievalChain.from_llm(
        verbose = True,
        llm=llm,
        retriever=vector_store.as_retriever(),
        memory=memory,
        combine_docs_chain_kwargs={"prompt": prompt}
    )
    return conversation_chain

Copy after login

Handling user queries

Process user input, pass it to the conversation chain, and update the chat history.

def handle_user_input(question):
    try:
        response = st.session_state.conversation({'question': question})
        st.session_state.chat_history = response['chat_history']
    except Exception as e:
        st.error('Please select PDF and click on Process.')

Copy after login

Creating custom HTML template for streamlit chat

To create a custom chat interface for both user and bot messages using CSS, design custom templates and style them with CSS.

Create a file named htmlTemplates.py and add the following code to it.

css = '''
<style>
.chat-message {
    padding: 1rem; border-radius: 0.5rem; margin-bottom: 1rem; display: flex
}
.chat-message.user {
    background-color: #2b313e
}
.chat-message.bot {
    background-color: #475063
}
.chat-message .avatar {
  width: 10%;
}
.chat-message .avatar img {
  max-width: 30px;
  max-height: 30px;
  border-radius: 50%;
  object-fit: cover;
}
.chat-message .message {
  width: 90%;
  padding: 0 1rem;
  color: #fff;
}
'''

bot_template = '''
<div>



<h3>
  
  
  Displaying chat history
</h3>

<p>Show the user and AI conversation history in a reverse order with HTML templates for formatting.<br>
</p>

<pre class="brush:php;toolbar:false">def display_chat_history():
    if st.session_state.chat_history:
        reversed_history = st.session_state.chat_history[::-1]

        formatted_history = []
        for i in range(0, len(reversed_history), 2):
            chat_pair = {
                "AIMessage": reversed_history[i].content,
                "HumanMessage": reversed_history[i + 1].content
            }
            formatted_history.append(chat_pair)

        for i, message in enumerate(formatted_history):
            st.write(user_template.replace("{{MSG}}", message['HumanMessage']), unsafe_allow_html=True)
            st.write(bot_template.replace("{{MSG}}", message['AIMessage']), unsafe_allow_html=True)

Copy after login

Building Streamlit app interface

Set up the main app interface for file uploads, question input, and chat history display.

def main():
    st.set_page_config(page_title='Chat with PDFs', page_icon=':books:')
    st.write(css, unsafe_allow_html=True)

    if "conversation" not in st.session_state:
        st.session_state.conversation = None
    if "chat_history" not in st.session_state:
        st.session_state.chat_history = None

    st.header('Chat with PDFs :books:')

    question = st.text_input("Ask anything to your PDF:")
    if question:
        handle_user_input(question)

    if st.session_state.chat_history is not None:
        display_chat_history()

    with st.sidebar:
        st.subheader("Upload your Documents Here: ")
        pdf_files = st.file_uploader("Choose your PDF Files and Press Process button", type=['pdf'], accept_multiple_files=True)

        if pdf_files and st.button("Process"):
            with st.spinner("Processing your PDFs..."):
                try:
                    # Get PDF Text
                    raw_text = get_pdf_text(pdf_files)
                    # Get Text Chunks
                    text_chunks = get_chunk_text(raw_text)
                    # Create Vector Store
                    vector_store = get_vector_store(text_chunks)
                    st.success("Your PDFs have been processed successfully. You can ask questions now.")
                    # Create conversation chain
                    st.session_state.conversation = get_conversation_chain(vector_store)
                except Exception as e:
                    st.error(f"An error occurred: {e}")

if __name__ == '__main__':
    main()

Copy after login

Complete Code for the PDF Chat Application

The following is the complete code implementation for the PDF Chat Application. It integrates environment variable setup, text extraction, vector storage, and RAG features into a streamlined solution:

from dotenv import load_dotenv
import os
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")
OPENAI_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_EMBEDDING_MODEL_NAME")

import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.chat_models import ChatOpenAI
from htmlTemplates import bot_template, user_template, css

def get_pdf_text(pdf_files):
    text = ""
    for pdf_file in pdf_files:
        reader = PdfReader(pdf_file)
        for page in reader.pages:
            text += page.extract_text()
    return text

def get_chunk_text(text):
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    chunks = text_splitter.split_text(text)
    return chunks

def get_vector_store(text_chunks):
    embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model=OPENAI_EMBEDDING_MODEL_NAME)
    vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
    return vectorstore

def get_conversation_chain(vector_store):
    llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name=OPENAI_MODEL_NAME, temperature=0)
    memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
    system_template  =  """
    Use  the following pieces of context and chat history to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.

    Context: {context}

    Chat history: {chat_history}

    Question: {question}
    Helpful Answer:
    """
    prompt = PromptTemplate(
        template=system_template,
        input_variables=["context", "question",  "chat_history"],
    )
    conversation_chain = ConversationalRetrievalChain.from_llm(
        verbose = True,
        llm=llm,
        retriever=vector_store.as_retriever(),
        memory=memory,
        combine_docs_chain_kwargs={"prompt": prompt}
    )
    return conversation_chain

def handle_user_input(question):
    try: 
        response = st.session_state.conversation({'question': question})
        st.session_state.chat_history = response['chat_history']
    except Exception as e:
        st.error('Please select PDF and click on OK.')

def display_chat_history():
    if st.session_state.chat_history:
        reversed_history = st.session_state.chat_history[::-1]

        formatted_history = []
        for i in range(0, len(reversed_history), 2):
            chat_pair = {
                "AIMessage": reversed_history[i].content,
                "HumanMessage": reversed_history[i + 1].content
            }
            formatted_history.append(chat_pair)

        for i, message in enumerate(formatted_history):
            st.write(user_template.replace("{{MSG}}", message['HumanMessage']), unsafe_allow_html=True)
            st.write(bot_template.replace("{{MSG}}", message['AIMessage']), unsafe_allow_html=True)

def main():
    st.set_page_config(page_title='Chat with PDFs', page_icon=':books:')
    st.write(css, unsafe_allow_html=True)

    if "conversation" not in st.session_state:
        st.session_state.conversation = None
    if "chat_history" not in st.session_state:
        st.session_state.chat_history = None

    st.header('Chat with PDFs :books:')

    question = st.text_input("Ask anything to your PDF:")
    if question:
        handle_user_input(question)

    if st.session_state.chat_history is not None:
        display_chat_history()

    with st.sidebar:
        st.subheader("Upload your Documents Here: ")
        pdf_files = st.file_uploader("Choose your PDF Files and Press Process button", type=['pdf'], accept_multiple_files=True)

        if pdf_files and st.button("Process"):
            with st.spinner("Processing your PDFs..."):
                try:
                    # Get PDF Text
                    raw_text = get_pdf_text(pdf_files)
                    # Get Text Chunks
                    text_chunks = get_chunk_text(raw_text)
                    # Create Vector Store
                    vector_store = get_vector_store(text_chunks)
                    st.success("Your PDFs have been processed successfully. You can ask questions now.")
                    # Create conversation chain
                    st.session_state.conversation = get_conversation_chain(vector_store)
                except Exception as e:
                    st.error(f"An error occurred: {e}")

if __name__ == '__main__':
    main()

Copy after login

Run the Application

Execute the app with Streamlit using the following command.

streamlit run app.py

Copy after login

You will get output as follows,
The ultimate guide to Retrieval-Augmented Generation (RAG)

Thanks for reading this article !!

Thanks Gowri M Bhatt for reviewing the content.

If you enjoyed this article, please click on the heart button ♥ and share to help others find it!

The full source code for this tutorial can be found here,

codemaker2015/pdf-chat-using-RAG | github.com

Resources

RAG and LLM business process automation: A technical strategy - Grid Dynamics | www.griddynamics.com
Introduction to Retrieval Augmented Generation (RAG) | Weaviate
Techniques, Challenges, and Future of Augmented Language Models - Gradient Flow
Retrieval Augmented Generation at Planet Scale | Arcus

The above is the detailed content of The ultimate guide to Retrieval-Augmented Generation (RAG). For more information, please follow other related articles on the PHP Chinese website!