开始使用OpenAI结构化输出-人工智能-PHP中文网

Getting Started With OpenAI Structured Outputs

在2024年8月，Openai宣布了其API的强大新功能 - 结构化输出。顾名思义，使用此功能，您可以确保LLM仅以指定的格式生成响应。此功能将使需要精确数据格式的应用程序变得更加容易。

在本教程中，您将学习如何从OpenAI结构化输出开始，了解其新的语法并探索其关键应用程序。

在AI应用程序中结构化输出的重要性

确定性响应，换句话说，以一致格式的响应对于许多任务，例如数据输入，信息检索，问答，多步工作流等等至关重要。您可能已经体验了LLMS如何以截然不同的格式生成输出，即使提示是相同的。

例如，考虑由GPT-4O驱动的此简单的分类函数：

# List of hotel reviews
reviews = [
   "The room was clean and the staff was friendly.",
   "The location was terrible and the service was slow.",
   "The food was amazing but the room was too small.",
]
# Classify sentiment for each review and print the results
for review in reviews:
   sentiment = classify_sentiment(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

>输出：

Review: The room was clean and the staff was friendly.
Sentiment: Positive
Review: The location was terrible and the service was slow.
Sentiment: Negative
Review: The food was amazing but the room was too small.
Sentiment: The sentiment of the review is neutral.

登录后复制

即使前两个响应是相同的单字格式，最后一个是整个句子。如果其他一些下游应用程序取决于上述代码的输出，则它将崩溃，因为它会期望单词响应。

>我们可以通过一些及时的工程来解决此问题，但这是一个耗时的迭代过程。即使有了完美的提示，我们也不能100％确定响应将在以后的请求中符合我们的格式。当然，除非我们使用结构化的输出：

>输出：

def classify_sentiment_with_structured_outputs(review):
   """Sentiment classifier with Structured Outputs"""
   ...
# Classify sentiment for each review with Structured Outputs
for review in reviews:
   sentiment = classify_sentiment_with_structured_outputs(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

使用新函数，classify_sentiment_with_structured_outputs，响应都以相同的格式。

Review: The room was clean and the staff was friendly.
Sentiment: {"sentiment":"positive"}
Review: The location was terrible and the service was slow.
Sentiment: {"sentiment":"negative"}
Review: The food was amazing but the room was too small.
Sentiment: {"sentiment":"neutral"}

登录后复制

>以刚性格式强迫语言模型的能力非常重要，可以为您节省无数小时的及时工程或依赖其他开源工具。>

>从OpenAI结构化输出开始

在本节中，我们将使用情感分析仪函数的示例分解结构化输出。

设置您的环境

>先决条件

在开始之前，请确保您有以下内容：>

python 3.7或以后安装在您的系统上。

>

> OpenAI API键。您可以通过在OpenAI网站上注册来获得此功能。

2。设置API密钥：您可以将API密钥设置为环境变量或直接在代码中。要将其设置为环境变量，请运行：

>

3。验证安装：创建一个简单的python脚本以验证安装：>

# List of hotel reviews
reviews = [
   "The room was clean and the staff was friendly.",
   "The location was terrible and the service was slow.",
   "The food was amazing but the room was too small.",
]
# Classify sentiment for each review and print the results
for review in reviews:
   sentiment = classify_sentiment(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

>运行脚本以确保正确设置所有内容。您应该在终端中看到模型的响应。

> 除了OpenAi软件包外，您还需要Pydantic库来定义和验证JSON模式的结构化输出。使用PIP安装它：

Review: The room was clean and the staff was friendly.
Sentiment: Positive
Review: The location was terrible and the service was slow.
Sentiment: Negative
Review: The food was amazing but the room was too small.
Sentiment: The sentiment of the review is neutral.

登录后复制

>通过这些步骤，您的环境现在可以使用OpenAI的结构化输出功能。>

使用pydantic

定义输出模式

>要使用结构化输出，您需要使用Pydantic模型来定义预期的输出结构。 Pydantic是Python的数据验证和设置管理库，它允许您使用Python型注释来定义数据模型。然后可以使用这些模型来强制执行OpenAI模型生成的输出的结构。

这是一个示例pydantic模型，用于指定我们的评论情感分类器的格式：

在此示例中

>：

def classify_sentiment_with_structured_outputs(review):
   """Sentiment classifier with Structured Outputs"""
   ...
# Classify sentiment for each review with Structured Outputs
for review in reviews:
   sentiment = classify_sentiment_with_structured_outputs(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

sentimentResponse是一个pydantic模型，它定义了输出的预期结构。

模型具有单一字段情绪，只能采用三个字面价值之一：“正面”，“负”或“中性”。

当我们作为OpenAI API请求的一部分传递此模型时，输出将只是我们提供的单词之一。

在OpenAI请求中强制执行我们的Pydantic模式，我们要做的就是将其传递给聊天完成API的响应_format参数。粗略地，这是它的样子：

如果您注意到，而不是使用client.chat.completions.create，我们使用的是client.beta.chat.completions.parse方法。 .parse（）是专门为结构化输出编写的聊天完成API中的一种新方法。

现在，让我们通过重写带有结构化输出的评论情感分类器来将所有内容整合在一起。首先，我们进行必要的导入，定义pydantic模型，系统提示和提示模板：

>

然后，我们编写了一个使用.parse（）助手方法的新功能：>

函数中的重要行是response_format = entimentResponse，这实际上是启用结构化输出的方法。

Review: The room was clean and the staff was friendly.
Sentiment: {"sentiment":"positive"}
Review: The location was terrible and the service was slow.
Sentiment: {"sentiment":"negative"}
Review: The food was amazing but the room was too small.
Sentiment: {"sentiment":"neutral"}

登录后复制

让我们对其中一项评论进行测试：

在这里，结果是一个消息对象：>

除了检索响应的.content属性外，它具有.parsed属性，该属性将解析的信息返回为类：

$ pip install -U openai

登录后复制

如您所见，我们有一个sentermentresponse类的实例。这意味着我们可以使用.sentiment属性以字符串而不是字典访问情感：>

# List of hotel reviews
reviews = [
   "The room was clean and the staff was friendly.",
   "The location was terrible and the service was slow.",
   "The food was amazing but the room was too small.",
]
# Classify sentiment for each review and print the results
for review in reviews:
   sentiment = classify_sentiment(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

嵌套pydantic模型，用于定义复杂模式

在某些情况下，您可能需要定义涉及嵌套数据的更复杂的输出结构。 Pydantic允许您相互嵌套模型，使您能够创建可以处理各种用例的复杂模式。在处理层次数据时，或者需要为复杂输出执行特定结构时，这特别有用。

>让我们考虑一个示例，我们需要在其中提取详细的用户信息，包括其姓名，联系方式和地址列表。每个地址都应包括街道，城市，州和邮政编码的字段。这需要一个以上的pydantic模型来构建正确的模式。

>步骤1：定义Pydantic模型

首先，我们为地址和用户信息定义了pydantic模型：>

在此示例中

>：

Review: The room was clean and the staff was friendly.
Sentiment: Positive
Review: The location was terrible and the service was slow.
Sentiment: Negative
Review: The food was amazing but the room was too small.
Sentiment: The sentiment of the review is neutral.

登录后复制

地址是一个定义地址结构的pydantic模型。>

>步骤2：在API调用中使用嵌套的pydantic模型

接下来，我们使用这些嵌套的pydantic模型来在OpenAI API调用中强制执行输出结构：

示例文本完全不可读，并且在关键信息之间缺少空间。让我们看看该模型是否成功。我们将使用JSON库来使响应很好：

def classify_sentiment_with_structured_outputs(review):
   """Sentiment classifier with Structured Outputs"""
   ...
# Classify sentiment for each review with Structured Outputs
for review in reviews:
   sentiment = classify_sentiment_with_structured_outputs(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

如您所见，该模型根据我们提供的架构正确捕获了单个用户的信息以及他们的两个单独的地址。

简而言之，通过嵌套pydantic模型，您可以定义处理层次数据并为复杂输出执行特定结构的复杂模式。

函数用结构化输出调用

Review: The room was clean and the staff was friendly.
Sentiment: {"sentiment":"positive"}
Review: The location was terrible and the service was slow.
Sentiment: {"sentiment":"negative"}
Review: The food was amazing but the room was too small.
Sentiment: {"sentiment":"neutral"}

登录后复制

>新语言模型的广泛特征之一是函数调用（也称为工具调用）。此功能使您可以将语言模型连接到用户定义的功能，从而有效地（模型）访问外部世界。

一些常见的示例是：

>检索实时数据（例如，天气，股价，运动得分）

执行计算或数据分析

>查询数据库或API

生成图像或其他媒体

在语言之间翻译文本
控制智能家居设备或物联网系统
>执行自定义业务逻辑或工作流程

>重要的是，使用结构化输出，使用OpenAI模型使用函数调用变得更加容易。过去，您将传递给OpenAI模型的功能将需要编写复杂的JSON模式，并用类型提示概述每个功能参数。这是一个示例：

# List of hotel reviews
reviews = [
   "The room was clean and the staff was friendly.",
   "The location was terrible and the service was slow.",
   "The food was amazing but the room was too small.",
]
# Classify sentiment for each review and print the results
for review in reviews:
   sentiment = classify_sentiment(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

>即使get_current_weather函数具有两个参数，其JSON模式也变得巨大且容易出错。

>通过再次使用Pydantic模型在结构化输出中解决：>

首先，您编写功能本身及其逻辑。然后，您可以使用指定预期输入参数的pydantic模型再次定义它。

Review: The room was clean and the staff was friendly.
Sentiment: Positive
Review: The location was terrible and the service was slow.
Sentiment: Negative
Review: The food was amazing but the room was too small.
Sentiment: The sentiment of the review is neutral.

登录后复制

然后，要将pydantic模型转换为兼容的JSON模式，您可以致电Pydantic_function_tool：

这是如何将此工具作为请求的一部分使用：的一部分

def classify_sentiment_with_structured_outputs(review):
   """Sentiment classifier with Structured Outputs"""
   ...
# Classify sentiment for each review with Structured Outputs
for review in reviews:
   sentiment = classify_sentiment_with_structured_outputs(review)
   print(f"Review: {review}\nSentiment: {sentiment}\n")

登录后复制

>我们以兼容的JSON格式将Pydantic模型传递给聊天完成API的工具参数。然后，根据我们的查询，该模型决定是否调用该工具。

>由于上面的查询是“东京的天气是什么？”，我们在返回消息对象的tool_calls中看到了一个电话。

Review: The room was clean and the staff was friendly.
Sentiment: {"sentiment":"positive"}
Review: The location was terrible and the service was slow.
Sentiment: {"sentiment":"negative"}
Review: The food was amazing but the room was too small.
Sentiment: {"sentiment":"neutral"}

登录后复制

记住，该模型未调用get_weather函数，而是根据我们提供的Pydantic模式生成参数：

>由我们通过提供的参数调用该函数：>

如果您希望该模型生成该功能的参数并同时调用它，则您正在寻找AI代理。