Retrieval-augmented generation (RAG) enhances AI models by integrating external knowledge. However, traditional RAG often fragments documents, losing crucial context and impacting accuracy.
Anthropic's contextual retrieval addresses this by adding concise, context-rich explanations to each document chunk before embedding. This significantly reduces retrieval errors, leading to improved downstream task performance. This article details contextual retrieval and its implementation.
Leverage LangChain and RAG to integrate external data with LLMs.
Traditional RAG methods divide documents into smaller chunks for easier retrieval, but this can eliminate essential context. For instance, a chunk might state "Its more than 3.85 million inhabitants make it the European Union's most populous city" without specifying the city. This lack of context hinders accuracy.
Contextual retrieval solves this by prepending a short, context-specific summary to each chunk before embedding. The previous example would become:
<code>contextualized_chunk = """Berlin is the capital and largest city of Germany, known for being the EU's most populous city within its limits. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. """</code>
Anthropic's internal testing across diverse datasets (codebases, scientific papers, fiction) demonstrates that contextual retrieval reduces retrieval errors by up to 49% when paired with contextual embedding models and Contextual BM25.
This section outlines a step-by-step implementation using a sample document:
<code># Input text for the knowledge base input_text = """Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany and is the third smallest state in the country in terms of area. Paris is the capital and most populous city of France. It is situated along the Seine River in the north-central part of the country. The city has a population of over 2.1 million residents within its administrative limits, making it one of Europe's major population centers."""</code>
Step 1: Chunk Creation
Divide the document into smaller, independent chunks (here, sentences):
<code># Splitting the input text into smaller chunks test_chunks = [ 'Berlin is the capital and largest city of Germany, both by area and by population.', "\n\nIts more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits.", '\n\nThe city is also one of the states of Germany and is the third smallest state in the country in terms of area.', '\n\n# Paris is the capital and most populous city of France.', '\n\n# It is situated along the Seine River in the north-central part of the country.', "\n\n# The city has a population of over 2.1 million residents within its administrative limits, making it one of Europe's major population centers." ]</code>
Step 2: Prompt Template Definition
Define the prompt for context generation (Anthropic's template is used):
<code>from langchain.prompts import ChatPromptTemplate, PromptTemplate, HumanMessagePromptTemplate # Define the prompt for generating contextual information anthropic_contextual_retrieval_system_prompt = """<document> {WHOLE_DOCUMENT} </document> Here is the chunk we want to situate within the whole document <chunk> {CHUNK_CONTENT} </chunk> Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.""" # ... (rest of the prompt template code remains the same)</code>
Step 3: LLM Initialization
Choose an LLM (OpenAI's GPT-4o is used here):
<code>import os from langchain_openai import ChatOpenAI # Load environment variables os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY" # Initialize the model instance llm_model_instance = ChatOpenAI( model="gpt-4o", )</code>
Step 4: Chain Creation
Connect the prompt and LLM:
<code>from langchain.output_parsers import StrOutputParser # Chain the prompt with the model instance contextual_chunk_creation = anthropic_contextual_retrieval_final_prompt | llm_model_instance | StrOutputParser()</code>
Step 5: Chunk Processing
Generate context for each chunk:
<code># Process each chunk and generate contextual information for test_chunk in test_chunks: res = contextual_chunk_creation.invoke({ "WHOLE_DOCUMENT": input_text, "CHUNK_CONTENT": test_chunk }) print(res) print('-----')</code>
(Output is shown in the original example)
Reranking further refines retrieval by prioritizing the most relevant chunks. This improves accuracy and reduces costs. In Anthropic's tests, reranking decreased retrieval errors from 5.7% to 1.9%, a 67% improvement.
For smaller knowledge bases (<200,000 tokens), including the entire knowledge base directly in the prompt might be more efficient than using retrieval systems. Also, utilizing prompt caching (available with Claude) can significantly reduce costs and improve response times.
Anthropic's contextual retrieval offers a straightforward yet powerful method to improve RAG systems. The combination of contextual embeddings, BM25, and reranking enhances accuracy substantially. Further exploration of other retrieval techniques is recommended.
The above is the detailed content of Anthropic's Contextual Retrieval: A Guide With Implementation. For more information, please follow other related articles on the PHP Chinese website!