How a history-aware retriever works?
The history-aware retriever discussed in this post is the one returned by the create_history_aware_retriever function from the LangChain package. This function is designed to receive the following inputs in its constructor:
- An LLM (a language model that receives a query and returns an answer);
- A vector store retriever (a model that receives a query and returns a list of relevant documents).
- A chat history (a list of message interactions, typically between a human and an AI).
When invoked, the history-aware retriever takes a user query as input and outputs a list of relevant documents. The relevant documents are based on the query combined with the context provided by the chat history.
At the end, I summarize its workflow.
Setting it
from langchain.chains import create_history_aware_retriever from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_chroma import Chroma from dotenv import load_dotenv import bs4 load_dotenv() # To get OPENAI_API_KEY
def create_vectorsore_retriever(): """ Returns a vector store retriever based on the text of a specific web page. """ URL = r'https://lilianweng.github.io/posts/2023-06-23-agent/' loader = WebBaseLoader( web_paths=(URL,), bs_kwargs=dict( parse_only=bs4.SoupStrainer(class_=("post-content", "post-title", "post-header")) )) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True) splits = text_splitter.split_documents(docs) vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings()) return vectorstore.as_retriever()
def create_prompt(): """ Returns a prompt instructed to produce a rephrased question based on the user's last question, but referencing previous messages (chat history). """ system_instruction = """Given a chat history and the latest user question \ which might reference context in the chat history, formulate a standalone question \ which can be understood without the chat history. Do NOT answer the question, \ just reformulate it if needed and otherwise return it as is.""" prompt = ChatPromptTemplate.from_messages([ ("system", system_instruction), MessagesPlaceholder("chat_history"), ("human", "{input}")]) return prompt
llm = ChatOpenAI(model='gpt-4o-mini') vectorstore_retriever = create_vectorsore_retriever() prompt = create_prompt()
history_aware_retriever = create_history_aware_retriever( llm, vectorstore_retriever, prompt )
Using it
Here, a question is being asked without any chat history, so the retriever only responds with the documents relevant to the last question.
chat_history = [] docs = history_aware_retriever.invoke({'input': 'what is planning?', 'chat_history': chat_history}) for i, doc in enumerate(docs): print(f'Chunk {i+1}:') print(doc.page_content) print()
Chunk 1: Planning is essentially in order to optimize believability at the moment vs in time. Prompt template: {Intro of an agent X}. Here is X's plan today in broad strokes: 1) Relationships between agents and observations of one agent by another are all taken into consideration for planning and reacting. Environment information is present in a tree structure. Chunk 2: language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains. Chunk 3: Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural Chunk 4: Planning Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results. Memory
Now, based on the chat history, the retriever knows that the human want to know about task decomposition as well as planning. So it responds with chunks of text that reference both themes.
chat_history = [ ('human', 'when I ask about planning I want to know about Task Decomposition too.')] docs = history_aware_retriever.invoke({'input': 'what is planning?', 'chat_history': chat_history}) for i, doc in enumerate(docs): print(f'Chunk {i+1}:') print(doc.page_content) print()
Chunk 1: Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Chunk 2: Fig. 1. Overview of a LLM-powered autonomous agent system. Component One: Planning# A complicated task usually involves many steps. An agent needs to know what they are and plan ahead. Task Decomposition# Chunk 3: Planning Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results. Memory Chunk 4: Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
Now the entirely question is based on the chat history. And we can see that it responds with chunks of text that reference the correct concept.
chat_history = [ ('human', 'What is ReAct?'), ('ai', 'ReAct integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space')] docs = history_aware_retriever.invoke({'input': 'It is a way of doing what?', 'chat_history': chat_history}) for i, doc in enumerate(docs): print(f'Chunk {i+1}:') print(doc.page_content) print()
Chunk 1:<br> ReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.<br> The ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:<br> Thought: ...<br> Action: ...<br> Observation: ... <p>Chunk 2:<br> Fig. 2. Examples of reasoning trajectories for knowledge-intensive tasks (e.g. HotpotQA, FEVER) and decision-making tasks (e.g. AlfWorld Env, WebShop). (Image source: Yao et al. 2023).<br> In both experiments on knowledge-intensive tasks and decision-making tasks, ReAct works better than the Act-only baseline where Thought: … step is removed.</p> <p>Chunk 3:<br> The LLM is provided with a list of tool names, descriptions of their utility, and details about the expected input/output.<br> It is then instructed to answer a user-given prompt using the tools provided when necessary. The instruction suggests the model to follow the ReAct format - Thought, Action, Action Input, Observation.</p> <p>Chunk 4:<br> Case Studies#<br> Scientific Discovery Agent#<br> ChemCrow (Bran et al. 2023) is a domain-specific example in which LLM is augmented with 13 expert-designed tools to accomplish tasks across organic synthesis, drug discovery, and materials design. The workflow, implemented in LangChain, reflects what was previously described in the ReAct and MRKLs and combines CoT reasoning with tools relevant to the tasks:<br> </p>
Conclusion
In conclusion, the workflow of the history-aware retrievers functions as follows when .invoke({'input': '...', 'chat_history': '...'}) is called:
- It replaces the input and chat_history placeholders in the prompt with specified values, creating a new ready-to-use prompt that essentially says "Take this chat history and this last input, and rephrase the last input in a way that anyone can understand it without seeing the chat history".
- It sends the new prompt to the LLM and receives a rephrased input.
- It then sends the rephrased input to the vector store retriever and receives a list of documents relevant to this rephrased input.
- Finnally, it returns this list of relevant documents.
Obs.: It is important to note that the embedding used to transform text into vector is the one specified whem Chroma.from_documents is called. When none is specified (the present case), the default chroma embedding is used.
The above is the detailed content of How a history-aware retriever works?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.
