Enhancing RAG: Beyond Vanilla Approaches
Retrieval-Augmented Generation (RAG) significantly boosts language models by integrating external information retrieval. Standard RAG, while improving response relevance, often falters in complex retrieval situations. This article examines the shortcomings of basic RAG and presents advanced methods to improve accuracy and efficiency.
Limitations of Basic RAG
Consider a simple scenario: retrieving relevant information from several documents. Our dataset includes:
- A primary document detailing healthy, productive lifestyle practices.
- Two unrelated documents containing some overlapping keywords, but in different contexts.
<code>main_document_text = """ Morning Routine (5:30 AM - 9:00 AM) ✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested. ✅ Hydrate First - Drink a glass of water to rehydrate your body. ✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body. ✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing. ✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber. ✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks. ... """</code>
A basic RAG system, when queried with:
- How can I improve my health and productivity?
- What are the best strategies for a healthy and productive lifestyle?
may struggle to consistently retrieve the primary document due to the presence of similar words in unrelated documents.
Helper Functions: Streamlining the RAG Pipeline
To improve retrieval accuracy and simplify query processing, we introduce helper functions. These functions handle tasks such as querying the ChatGPT API, calculating document embeddings, and determining similarity scores. This creates a more efficient RAG pipeline.
Here are the helper functions:
<code># **Imports** import os import json import openai import numpy as np from scipy.spatial.distance import cosine from google.colab import userdata # Set up OpenAI API key os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>
<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN): try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.0 , # Adjust for more or less creativity response_format=response_format ) return response.choices[0].message.content.strip() except Exception as e: return f"Error: {e}"</code>
<code>def get_embedding(text, model="text-embedding-3-large"): #"text-embedding-ada-002" """Fetches the embedding for a given text using OpenAI's API.""" response = client.embeddings.create( input=[text], model=model ) return response.data[0].embedding</code>
<code>def compute_similarity_metrics(embed1, embed2): """Computes different similarity/distance metrics between two embeddings.""" cosine_sim = 1- cosine(embed1, embed2) # Cosine similarity return cosine_sim</code>
<code>def fetch_similar_docs(query, docs, threshold = .55, top=1): query_em = get_embedding(query) data = [] for d in docs: # Compute and print similarity metrics similarity_results = compute_similarity_metrics(d["embedding"], query_em) if(similarity_results >= threshold): data.append({"id":d["id"], "ref_doc":d.get("ref_doc", ""), "score":similarity_results}) # Sorting by value (second element in each tuple) sorted_data = sorted(data, key=lambda x: x["score"], reverse=True) # Ascending order sorted_data = sorted_data[:min(top, len(sorted_data))] return sorted_data</code>
Evaluating Basic RAG
We test the basic RAG using predefined queries to assess its ability to retrieve the most relevant document based on semantic similarity. This highlights its limitations.
<code>"""# **Testing Vanilla RAG**""" query = "what should I do to stay healthy and productive?" r = fetch_similar_docs(query, docs) print("query = ", query) print("documents = ", r) query = "what are the best practices to stay healthy and productive ?" r = fetch_similar_docs(query, docs) print("query = ", query) print("documents = ", r)</code>
Advanced Techniques for Enhanced RAG
To improve the retrieval process, we introduce functions that generate structured information to enhance document retrieval and query processing.
Three key enhancements are implemented:
1. Generating FAQs
Creating FAQs from the document expands the query matching possibilities. These FAQs are generated once and stored, enriching the search space without recurring costs.
<code>def generate_faq(text): prompt = f''' given the following text: """{text}""" Ask relevant simple atomic questions ONLY (don't answer them) to cover all subjects covered by the text. Return the result as a json list example [q1, q2, q3...] ''' return query_chatgpt(prompt, response_format={ "type": "json_object" })</code>
2. Creating an Overview
A concise summary captures the document's core ideas, improving retrieval effectiveness. The overview's embedding is added to the document collection.
<code>def generate_overview(text): prompt = f''' given the following text: """{text}""" Generate an abstract for it that tells in maximum 3 lines what is it about and use high level terms that will capture the main points, Use terms and words that will be most likely used by average person. ''' return query_chatgpt(prompt)</code>
3. Query Decomposition
Broad queries are broken down into smaller, more precise sub-queries. These sub-queries are compared against the enhanced document collection (original document, FAQs, and overview). Results are merged for improved relevance.
<code>main_document_text = """ Morning Routine (5:30 AM - 9:00 AM) ✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested. ✅ Hydrate First - Drink a glass of water to rehydrate your body. ✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body. ✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing. ✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber. ✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks. ... """</code>
Evaluating the Enhanced RAG
Re-running the initial queries with these enhancements shows significant improvement. Query decomposition generates multiple sub-queries, leading to successful retrieval from both the FAQs and the original document.
<code># **Imports** import os import json import openai import numpy as np from scipy.spatial.distance import cosine from google.colab import userdata # Set up OpenAI API key os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')</code>
Example FAQ output:
<code>def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN): try: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=0.0 , # Adjust for more or less creativity response_format=response_format ) return response.choices[0].message.content.strip() except Exception as e: return f"Error: {e}"</code>
Cost-Benefit Analysis
While preprocessing (generating FAQs, overviews, and embeddings) adds an upfront cost, it's a one-time cost per document. This offsets the inefficiencies of a poorly optimized RAG system: frustrated users and increased query costs from retrieving irrelevant information. For high-volume systems, preprocessing is a worthwhile investment.
Conclusion
Combining document preprocessing (FAQs and overviews) with query decomposition creates a more intelligent RAG system that balances accuracy and cost-effectiveness. This enhances retrieval quality, reduces irrelevant results, and improves the user experience. Future research can explore further optimizations like dynamic thresholding and reinforcement learning for query refinement.
The above is the detailed content of Enhancing RAG: Beyond Vanilla Approaches. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let’

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

For those of you who might be new to my column, I broadly explore the latest advances in AI across the board, including topics such as embodied AI, AI reasoning, high-tech breakthroughs in AI, prompt engineering, training of AI, fielding of AI, AI re

Meta's Llama 3.2: A Multimodal AI Powerhouse Meta's latest multimodal model, Llama 3.2, represents a significant advancement in AI, boasting enhanced language comprehension, improved accuracy, and superior text generation capabilities. Its ability t
