The artificial intelligence revolution has swept the field of application development and opened a new era of human-computer interaction. While enterprises leverage AI to enhance user experience, solutions based on large language models (LLM) also present challenges in maintaining content integrity, accuracy and ethical standards.
The need for responsible AI audits becomes increasingly obvious as applications expand beyond controlled environments. In these environments, ensuring a reasonable and accurate response to users is not easy, but it is crucial.
For example, in customer service interactions, misinformation or inappropriate content can lead to customer dissatisfaction and even damage the reputation of the company. But as a developer, how can we ensure that AI-based applications can provide users with reasonable and accurate responses? This is where AI audit comes into play!
This article will explore in-depth a technology that uses the GPT model to audit GPT-based applications. Build a GPT-based quality audit agent
However, this article adopts a different approach to AI audit. Our focus is on ensuring the quality of model response, that is, accuracy and meeting user needs. As far as I know, there are currently no official endpoints designed specifically for this purpose.
Nevertheless, given that we use GPT models extensively in a variety of applications, why not use them as a quality inspector for the same model instance?
We can use the GPT model to evaluate the output generated by the model itself for user requests. This testing method helps prevent ambiguity and erroneous responses and enhances the model's ability to effectively satisfy user requests.
Scope and Target
For example, if you use a GPT model to power your enterprise chatbot, you must be very interested in making sure your chatbot does not provide any information beyond your catalog items or features.
In the following chapters, we will make the last example come alive by making simple calls to the OpenAI API using the openai Python package and the famous Jupyter Notebook.The main goal is to generate a simple LLM-based application and to audit its output using an LLM-based quality inspector. In our example, we need to create our sample customer service agent, a QA agent (called a QA agent from now on), and more importantly, define the interaction between the two.
The following image shows the above workflow well:
Homemade pictures. Audit workflow diagram: 1. The user sends a request to an LLM-based application (in this case, the customer service chatbot). 2. The chatbot generates the answer, but sends it to the QA agent first. 3. After the QA agent checks whether the answer is appropriate, it sends the answer back to the user.
Let's go step by step!
Let's start by building a conversation agent for the store's customer service.
If you already have a runnable LLM-driven application or want to implement your preferred example, feel free to skip the first part! If you still want to know if your business can benefit from LLM-based applications, then you should follow an interesting podcast discussion!
Let's assume we are building a customer service agent for our store. We are interested in using models such as ChatGPT behind this client agent to leverage their natural language capabilities to understand user queries and respond to them in a natural way.
To define our customer service chatbot, we need two key elements:
Lastly, like any other LLM-based application, we need a way to call the OpenAI API from our script. In this article, I will use the following implementation, which only depends on the openai package:
<code>import openai import os # 从环境中获取 OpenAI 密钥 openai_api_key = os.environ["OPENAI_API_KEY"] # 使用过去交互记忆的简单 OpenAI API 调用 def gpt_call(prompt, message_history, model="gpt-3.5-turbo"): message_history.append({'role': 'user', 'content': prompt}) response = openai.ChatCompletion.create( model=model, messages=message_history ) response_text = response.choices[0].message["content"] message_history.append({'role': 'assistant', 'content': response_text}) return response_text</code>
The idea behind it is to initialize a separate message history (including system messages) for each model instance and use the upcoming interaction with the model to continuously update it.
If you are looking for a more optimized way to handle interactions, I highly recommend using the langchain framework, as we did in Building a Context-Aware ChatBoard: Implementing ChatGPT with the LangChain framework.
If you are not familiar with the OpenAI API, consider checking out a webinar on Getting Started with OpenAI API and ChatGPT.
Now that we have identified the required building blocks, let's put them together:
<code># 定义我们的示例产品目录 product_information = """ { "name": "UltraView QLED TV", "category": "Televisions and Home Theater Systems", "brand": "UltraView", "model_number": "UV-QLED65", "warranty": "3 years", "rating": 4.9, "features": [ "65-inch QLED display", "8K resolution", "Quantum HDR", "Dolby Vision", "Smart TV" ], "description": "Experience lifelike colors and incredible clarity with this high-end QLED TV.", "price": 2499.99 } { "name": "ViewTech Android TV", "category": "Televisions and Home Theater Systems", "brand": "ViewTech", "model_number": "VT-ATV55", "warranty": "2 years", "rating": 4.7, "features": [ "55-inch 4K display", "Android TV OS", "Voice remote", "Chromecast built-in" ], "description": "Access your favorite apps and content on this smart Android TV.", "price": 799.99 } { "name": "SlimView OLED TV", "category": "Televisions and Home Theater Systems", "brand": "SlimView", "model_number": "SL-OLED75", "warranty": "2 years", "rating": 4.8, "features": [ "75-inch OLED display", "4K resolution", "HDR10+", "Dolby Atmos", "Smart TV" ], "description": "Immerse yourself in a theater-like experience with this ultra-thin OLED TV.", "price": 3499.99 } { "name": "TechGen X Pro", "category": "Smartphones and Accessories", "brand": "TechGen", "model_number": "TG-XP20", "warranty": "1 year", "rating": 4.5, "features": [ "6.4-inch AMOLED display", "128GB storage", "48MP triple camera", "5G", "Fast charging" ], "description": "A feature-packed smartphone designed for power users and mobile enthusiasts.", "price": 899.99 } { "name": "GigaPhone 12X", "category": "Smartphones and Accessories", "brand": "GigaPhone", "model_number": "GP-12X", "warranty": "2 years", "rating": 4.6, "features": [ "6.7-inch IPS display", "256GB storage", "108MP quad camera", "5G", "Wireless charging" ], "description": "Unleash the power of 5G and high-resolution photography with the GigaPhone 12X.", "price": 1199.99 } { "name": "Zephyr Z1", "category": "Smartphones and Accessories", "brand": "Zephyr", "model_number": "ZP-Z1", "warranty": "1 year", "rating": 4.4, "features": [ "6.2-inch LCD display", "64GB storage", "16MP dual camera", "4G LTE", "Long battery life" ], "description": "A budget-friendly smartphone with reliable performance for everyday use.", "price": 349.99 } { "name": "PixelMaster Pro DSLR", "category": "Cameras and Camcorders", "brand": "PixelMaster", "model_number": "PM-DSLR500", "warranty": "2 years", "rating": 4.8, "features": [ "30.4MP full-frame sensor", "4K video", "Dual Pixel AF", "3.2-inch touchscreen" ], "description": "Unleash your creativity with this professional-grade DSLR camera.", "price": 1999.99 } { "name": "ActionX Waterproof Camera", "category": "Cameras and Camcorders", "brand": "ActionX", "model_number": "AX-WPC100", "warranty": "1 year", "rating": 4.6, "features": [ "20MP sensor", "4K video", "Waterproof up to 50m", "Wi-Fi connectivity" ], "description": "Capture your adventures with this rugged and versatile action camera.", "price": 299.99 } { "name": "SonicBlast Wireless Headphones", "category": "Audio and Headphones", "brand": "SonicBlast", "model_number": "SB-WH200", "warranty": "1 year", "rating": 4.7, "features": [ "Active noise cancellation", "50mm drivers", "30-hour battery life", "Comfortable earpads" ], "description": "Immerse yourself in superior sound quality with these wireless headphones.", "price": 149.99 } """ # 为我们的用例定义一个合适的系统消息 customer_agent_sysmessage = f""" 您是一位客户服务代理,负责回答客户关于产品目录中产品的疑问。 产品目录将用三个反引号分隔,即 ```。 以友好和人性化的语气回复,并提供产品目录中可用的详细信息。 产品目录: ```{product_information}``` """ # 初始化模型的记忆 customer_agent_history = [{'role': 'system', 'content': customer_agent_sysmessage}]</code>
We can see that we have defined a sample directory (product_information) (JSONL format) and a system message (customer_agent_sysmessage) with three requirements:
Finally, we also initialize the customer agent's message history (customer_agent_history).
It is worth noting that we use feature styles when writing system messages and additional information (e.g., three backticks). This is one of the best practices for tip engineering! If you are interested in more best practices, the ChatGPT Tips Engineering Getting Started Guide webinar is for you!
At this point, we can start using our sample customer chatbot as follows:
<code>import openai import os # 从环境中获取 OpenAI 密钥 openai_api_key = os.environ["OPENAI_API_KEY"] # 使用过去交互记忆的简单 OpenAI API 调用 def gpt_call(prompt, message_history, model="gpt-3.5-turbo"): message_history.append({'role': 'user', 'content': prompt}) response = openai.ChatCompletion.create( model=model, messages=message_history ) response_text = response.choices[0].message["content"] message_history.append({'role': 'assistant', 'content': response_text}) return response_text</code>
Looks like a natural answer, right? Let's have a follow-up interaction:
<code># 定义我们的示例产品目录 product_information = """ { "name": "UltraView QLED TV", "category": "Televisions and Home Theater Systems", "brand": "UltraView", "model_number": "UV-QLED65", "warranty": "3 years", "rating": 4.9, "features": [ "65-inch QLED display", "8K resolution", "Quantum HDR", "Dolby Vision", "Smart TV" ], "description": "Experience lifelike colors and incredible clarity with this high-end QLED TV.", "price": 2499.99 } { "name": "ViewTech Android TV", "category": "Televisions and Home Theater Systems", "brand": "ViewTech", "model_number": "VT-ATV55", "warranty": "2 years", "rating": 4.7, "features": [ "55-inch 4K display", "Android TV OS", "Voice remote", "Chromecast built-in" ], "description": "Access your favorite apps and content on this smart Android TV.", "price": 799.99 } { "name": "SlimView OLED TV", "category": "Televisions and Home Theater Systems", "brand": "SlimView", "model_number": "SL-OLED75", "warranty": "2 years", "rating": 4.8, "features": [ "75-inch OLED display", "4K resolution", "HDR10+", "Dolby Atmos", "Smart TV" ], "description": "Immerse yourself in a theater-like experience with this ultra-thin OLED TV.", "price": 3499.99 } { "name": "TechGen X Pro", "category": "Smartphones and Accessories", "brand": "TechGen", "model_number": "TG-XP20", "warranty": "1 year", "rating": 4.5, "features": [ "6.4-inch AMOLED display", "128GB storage", "48MP triple camera", "5G", "Fast charging" ], "description": "A feature-packed smartphone designed for power users and mobile enthusiasts.", "price": 899.99 } { "name": "GigaPhone 12X", "category": "Smartphones and Accessories", "brand": "GigaPhone", "model_number": "GP-12X", "warranty": "2 years", "rating": 4.6, "features": [ "6.7-inch IPS display", "256GB storage", "108MP quad camera", "5G", "Wireless charging" ], "description": "Unleash the power of 5G and high-resolution photography with the GigaPhone 12X.", "price": 1199.99 } { "name": "Zephyr Z1", "category": "Smartphones and Accessories", "brand": "Zephyr", "model_number": "ZP-Z1", "warranty": "1 year", "rating": 4.4, "features": [ "6.2-inch LCD display", "64GB storage", "16MP dual camera", "4G LTE", "Long battery life" ], "description": "A budget-friendly smartphone with reliable performance for everyday use.", "price": 349.99 } { "name": "PixelMaster Pro DSLR", "category": "Cameras and Camcorders", "brand": "PixelMaster", "model_number": "PM-DSLR500", "warranty": "2 years", "rating": 4.8, "features": [ "30.4MP full-frame sensor", "4K video", "Dual Pixel AF", "3.2-inch touchscreen" ], "description": "Unleash your creativity with this professional-grade DSLR camera.", "price": 1999.99 } { "name": "ActionX Waterproof Camera", "category": "Cameras and Camcorders", "brand": "ActionX", "model_number": "AX-WPC100", "warranty": "1 year", "rating": 4.6, "features": [ "20MP sensor", "4K video", "Waterproof up to 50m", "Wi-Fi connectivity" ], "description": "Capture your adventures with this rugged and versatile action camera.", "price": 299.99 } { "name": "SonicBlast Wireless Headphones", "category": "Audio and Headphones", "brand": "SonicBlast", "model_number": "SB-WH200", "warranty": "1 year", "rating": 4.7, "features": [ "Active noise cancellation", "50mm drivers", "30-hour battery life", "Comfortable earpads" ], "description": "Immerse yourself in superior sound quality with these wireless headphones.", "price": 149.99 } """ # 为我们的用例定义一个合适的系统消息 customer_agent_sysmessage = f""" 您是一位客户服务代理,负责回答客户关于产品目录中产品的疑问。 产品目录将用三个反引号分隔,即 ```。 以友好和人性化的语气回复,并提供产品目录中可用的详细信息。 产品目录: ```{product_information}``` """ # 初始化模型的记忆 customer_agent_history = [{'role': 'system', 'content': customer_agent_sysmessage}]</code>
It's fair! The agent can recognize that there is no iPhone in our product catalog and reply to the user accordingly. It even encourages users to contact the store!
Let's try another request:
<code>user_prompt = "你们商店里最好的电视是什么?" customer_agent_response = gpt_call(user_prompt, customer_agent_history) print(customer_agent_response) # 输出:在我们商店里,UltraView QLED 电视被认为是最好的电视。它的评分为 4.9,并提供 65 英寸 QLED 显示屏和 8K 分辨率。它还具有 Quantum HDR、Dolby Vision 和智能电视功能。凭借其逼真的色彩和令人难以置信的清晰度,这款高端 QLED 电视可提供身临其境的观看体验。UltraView QLED 电视的价格为 2499.99 美元,并提供 3 年保修。</code>
Correctly , it seems that in some cases the proxy may skip the information in the directory. In these cases, the QA proxy can help filter unwanted responses.
As we have discussed, the purpose of QA agents is to check the quality of customer service agents based on user queries and product catalogs. Therefore, it is important to define a system message that sets this exact advanced behavior:
<code>user_prompt = "我想买最新的 iPhone。你能帮我吗?" customer_agent_response = gpt_call(user_prompt, customer_agent_history) print(customer_agent_response) # 输出:当然!我很乐意帮助您找到最新的 iPhone。但是,由于它似乎缺失于产品目录中,我目前无法提供有关最新 iPhone 型号的具体详细信息。我建议您查看我们的网站或直接联系我们的商店,以获取有关最新 iPhone 型号的最新信息。我们知识渊博的工作人员将能够帮助您选择最符合您的需求和偏好的 iPhone。</code>
For customer agents, user prompts are unpredictable because they depend on the user's needs and writing style. For QA agents, we are responsible for passing user requests, customer agent responses, and product catalogs to the model. Therefore, our prompt will always have the same structure, but the user query (user_prompt) and the model's response (customer_agent_response) are different:
<code>user_prompt = "你能帮我买一台三星电视吗?" customer_agent_response = gpt_call(user_prompt, customer_agent_history) print(customer_agent_response) # 输出:当然!我很乐意协助您购买三星电视。您能否提供您的一些具体要求或偏好?这样,我可以推荐最适合您需求的三星电视型号。</code>
Once the system message and QA prompt are defined, we can test the QA agent with the latest customer service response as follows:
<code>qa_sysmessage = f""" 您是一位质量助理,负责评估客户服务代理是否正确地回答了客户的问题。 您还必须验证客户服务代理是否仅提供我们商店产品目录中的信息,并温和地拒绝目录之外的任何其他产品。 客户消息、客户服务代理的回复和产品目录将用三个反引号分隔,即 ```。 请说明您的答案原因。 """</code>
To evaluate the response of the QA agent, let's break down the interaction it will analyze:
Quality agents can detect inappropriate responses from customer agents!
Now that we have let the two agents work independently, it is time to define the interaction between them.
We can use a simple chart to describe two agents and their requirements:
Homemade pictures. Three building block diagrams for each agent: system message (blue), model input (green), and model output (yellow).
What's next? Now, we need to implement the interaction between the two models!
The following is a suggestion for filtering inaccurate responses:
First, we let the customer agent generate a response based on user queries. Then, if the QA agent thinks the customer agent's response is good enough for the user query and product catalog, we just need to send the answer back to the user.
In contrast, if the QA agent determines that the answer does not meet the user's request or contains untrue information about the directory, we can ask the customer agent to improve the answer before sending it to the user.
In view of this idea, we can improve the last part of our original chart as follows:
Homemade pictures. Extended audit workflow diagram. We can use the QA proxy judgment to provide feedback to LLM-based applications.
To use the QA proxy as a filter, we need to make sure it outputs a consistent response in each iteration.
One way to achieve this is to change the QA proxy system message slightly and require it to output True only if the client proxy responds well enough, otherwise output False:
<code>import openai import os # 从环境中获取 OpenAI 密钥 openai_api_key = os.environ["OPENAI_API_KEY"] # 使用过去交互记忆的简单 OpenAI API 调用 def gpt_call(prompt, message_history, model="gpt-3.5-turbo"): message_history.append({'role': 'user', 'content': prompt}) response = openai.ChatCompletion.create( model=model, messages=message_history ) response_text = response.choices[0].message["content"] message_history.append({'role': 'assistant', 'content': response_text}) return response_text</code>
So when we evaluate the latest customer agent response again, we will only get boolean output:
<code># 定义我们的示例产品目录 product_information = """ { "name": "UltraView QLED TV", "category": "Televisions and Home Theater Systems", "brand": "UltraView", "model_number": "UV-QLED65", "warranty": "3 years", "rating": 4.9, "features": [ "65-inch QLED display", "8K resolution", "Quantum HDR", "Dolby Vision", "Smart TV" ], "description": "Experience lifelike colors and incredible clarity with this high-end QLED TV.", "price": 2499.99 } { "name": "ViewTech Android TV", "category": "Televisions and Home Theater Systems", "brand": "ViewTech", "model_number": "VT-ATV55", "warranty": "2 years", "rating": 4.7, "features": [ "55-inch 4K display", "Android TV OS", "Voice remote", "Chromecast built-in" ], "description": "Access your favorite apps and content on this smart Android TV.", "price": 799.99 } { "name": "SlimView OLED TV", "category": "Televisions and Home Theater Systems", "brand": "SlimView", "model_number": "SL-OLED75", "warranty": "2 years", "rating": 4.8, "features": [ "75-inch OLED display", "4K resolution", "HDR10+", "Dolby Atmos", "Smart TV" ], "description": "Immerse yourself in a theater-like experience with this ultra-thin OLED TV.", "price": 3499.99 } { "name": "TechGen X Pro", "category": "Smartphones and Accessories", "brand": "TechGen", "model_number": "TG-XP20", "warranty": "1 year", "rating": 4.5, "features": [ "6.4-inch AMOLED display", "128GB storage", "48MP triple camera", "5G", "Fast charging" ], "description": "A feature-packed smartphone designed for power users and mobile enthusiasts.", "price": 899.99 } { "name": "GigaPhone 12X", "category": "Smartphones and Accessories", "brand": "GigaPhone", "model_number": "GP-12X", "warranty": "2 years", "rating": 4.6, "features": [ "6.7-inch IPS display", "256GB storage", "108MP quad camera", "5G", "Wireless charging" ], "description": "Unleash the power of 5G and high-resolution photography with the GigaPhone 12X.", "price": 1199.99 } { "name": "Zephyr Z1", "category": "Smartphones and Accessories", "brand": "Zephyr", "model_number": "ZP-Z1", "warranty": "1 year", "rating": 4.4, "features": [ "6.2-inch LCD display", "64GB storage", "16MP dual camera", "4G LTE", "Long battery life" ], "description": "A budget-friendly smartphone with reliable performance for everyday use.", "price": 349.99 } { "name": "PixelMaster Pro DSLR", "category": "Cameras and Camcorders", "brand": "PixelMaster", "model_number": "PM-DSLR500", "warranty": "2 years", "rating": 4.8, "features": [ "30.4MP full-frame sensor", "4K video", "Dual Pixel AF", "3.2-inch touchscreen" ], "description": "Unleash your creativity with this professional-grade DSLR camera.", "price": 1999.99 } { "name": "ActionX Waterproof Camera", "category": "Cameras and Camcorders", "brand": "ActionX", "model_number": "AX-WPC100", "warranty": "1 year", "rating": 4.6, "features": [ "20MP sensor", "4K video", "Waterproof up to 50m", "Wi-Fi connectivity" ], "description": "Capture your adventures with this rugged and versatile action camera.", "price": 299.99 } { "name": "SonicBlast Wireless Headphones", "category": "Audio and Headphones", "brand": "SonicBlast", "model_number": "SB-WH200", "warranty": "1 year", "rating": 4.7, "features": [ "Active noise cancellation", "50mm drivers", "30-hour battery life", "Comfortable earpads" ], "description": "Immerse yourself in superior sound quality with these wireless headphones.", "price": 149.99 } """ # 为我们的用例定义一个合适的系统消息 customer_agent_sysmessage = f""" 您是一位客户服务代理,负责回答客户关于产品目录中产品的疑问。 产品目录将用三个反引号分隔,即 ```。 以友好和人性化的语气回复,并提供产品目录中可用的详细信息。 产品目录: ```{product_information}``` """ # 初始化模型的记忆 customer_agent_history = [{'role': 'system', 'content': customer_agent_sysmessage}]</code>
We can further use this boolean value to send the response to the user (if the QA proxy evaluates to True) or give the model a second chance to generate a new response (if the QA proxy evaluates to False).
Let's put everything together!
Because we have initialized two memory (with their system messages and additional information respectively), each client request can be processed as follows:
<code>user_prompt = "你们商店里最好的电视是什么?" customer_agent_response = gpt_call(user_prompt, customer_agent_history) print(customer_agent_response) # 输出:在我们商店里,UltraView QLED 电视被认为是最好的电视。它的评分为 4.9,并提供 65 英寸 QLED 显示屏和 8K 分辨率。它还具有 Quantum HDR、Dolby Vision 和智能电视功能。凭借其逼真的色彩和令人难以置信的清晰度,这款高端 QLED 电视可提供身临其境的观看体验。UltraView QLED 电视的价格为 2499.99 美元,并提供 3 年保修。</code>
As mentioned above, we have filtered the response based on its correctness. I leave you with the task of deciding how to handle inappropriate responses. We have come up with the idea of sending feedback to the client agent and asking them to try again, but how do we ask the QA agent to switch to a better response? There are many possibilities!
In this article, we explore the potential of using the GPT model as an auditor for other similar model instances. We have shown that the same powerful capabilities that lead us to use the LLM model in our applications can help our applications improve the accuracy and completeness of user interactions.
Contrary to the misunderstanding, implementing an audit level does not necessarily mean increasing the complexity of the application, and, as we have shown, sometimes it can be implemented with several lines of carefully designed code, significantly upgrading the functionality of the application.
Responsible LLM audits are imperative in today’s AI-driven world. This is not just an option, but a moral obligation. Through integrated AI audits, we ensure that our applications are not only powerful, but reliable and ethical. Let us advance development responsibly so that we can continue to benefit from AI while maintaining accuracy.
Thank you for reading! If you like the topic of AI Review, I encourage you to continue reading Promoting Responsible AI: Content Review in ChatGPT as a follow-up!
The above is the detailed content of A Comprehensive Guide to Moderating ChatGPT Responses with GPT Models. For more information, please follow other related articles on the PHP Chinese website!