OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了-人工智能-PHP中文网

首页

科技周边

人工智能

OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了

王林

Jul 19, 2024 am 01:29 AM

openai 工程

如果 AI 模型给的答案一点也看不懂，你敢用吗？

随着机器学习系统在更重要的领域得到应用，证明为什么我们可以信任它们的输出，并明确何时不应信任它们，变得越来越重要。

获得对复杂系统输出结果信任的一个可行方法是，要求系统对其输出产生一种解释，这种解释对人类或另一个受信任的系统来说是可读的，即可以完全理解以至于任何可能的错误都可以被发现。例如，为了建立对司法系统的信任，我们要求法院提供清晰易读的书面意见，解释并支持其决策。

对于大型语言模型来说，我们也可以采用类似的方法。

不过，在采用这种方法时，确保语言模型生成的是易于理解的文本，这件事非常重要，尤其是在处理数学、编码这样的复杂任务时。

如下图所示，你让 AI 写一个快速排序算法，AI 很快就写出来了，而且答案很简洁。但不会写代码的你要怎么判断 AI 写得对不对呢？ OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了

OpenAI 在一篇最新的论文中研究了这个问题。

论文标题：PROVER-VERIFIER GAMES IMPROVE LEGIBILITY OF LLM OUTPUTS
论文链接：https://cdn.openai.com/prover-verifier-games-improve-legibility-of-llm-outputs/legibility.pdf

他们发现，如果以「得到正确答案」为目标，要求大语言模型生成答案，它们给出的答案可能会很难懂。人类评估员在判断这些答案时，犯错的可能性会加倍。

所以他们的目标是在训练高级语言模型的过程中，让它们生成那些较简单的模型也能够轻松验证的文本。他们发现，这样的文本不仅对简单模型友好，对于人类来说也更可读。

那么，具体是怎么实现的呢？这还要提到 2021 年的一篇论文 ——「Learning to Give Checkable Answers with Prover-Verifier Games」。在这篇论文中，Cem Anil 等人提出了一个名为「Prover-Verifier Games（证明者 - 验证者博弈）」的博弈论框架。该框架由两个相互竞争的学习者组成：一个可信的验证者（Verifier）网络试图选择正确的答案，另一个更强大但不可信的证明者（Prover）网络试图说服验证者接受特定的答案，而不管其正确性如何。该框架的目标是从博弈中产生一个可靠的证明协议。

OpenAI 在这项新研究中借鉴了这个框架，让两个大模型（较强的为证明者，较弱的为验证者）互相博弈，直到二者达到一种平衡。此时，证明者给出的答案既有一定的可读性，又保持了合理的正确性。

OpenAI 表示，通过提高模型输出的可读性，可以使人工智能的应用更加有效和可信，这对于需要精确沟通的各个领域都是有益的。

值得一提的是，这篇论文来自 OpenAI 的超级对齐团队，其中几位作者在今年早些时候已经离开 OpenAI，团队也已经分崩离析。其中一位作者透露，这是该团队的最后一篇论文。

OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了

一个直观的例子

为了提高模型输出的可读性，OpenAI 优化了一些小学数学问题的思维链（CoT），让它能够被较弱的模型理解。举例如下：

问题：肖娜（Shawna）的父亲的年龄是肖娜的五倍。肖娜现在的年龄是阿莉娅（Aliya）的三倍。如果阿莉娅 3 岁，那么肖娜的父亲多少岁？

The purpose of this is to verify whether the output of the model trained in this way is more direct and easy to understand.

This method improves the performance of the model by 50% with only some optimizations on the training set. At the same time, the answers output by the model are also more understandable, which also helps human users who need to evaluate the correctness of these answers to make more accurate judgments. OpenAI believes that further narrowing the performance gap between more readable and more capable models is a key goal in future alignment research.

How do the "prover" and "verifier" play in training?

In order to motivate the strong model (prover) to generate more readable text, the researchers adopted a training scheme that requires the answers generated by the strong model to be easily verified by the weaker model (verifier) . Specifically, they used large and small models from the GPT-4 family. They differ by about three orders of magnitude in the amount of pre-training data. They all received some mathematical derivation training based on human annotation, but not extensive post-training.

First, the research team asked the prover model to try to solve a series of mathematical problems. Because the prover model itself may be wrong, some of the answers it generates may be incorrect. Next, they submitted these answers to the verifier model for testing, and conducted multiple rounds of alternating verification between the following two stages:

"Verifier" training: The research team trained the weaker " The "verifier" model verifies whether the answer given by the "prover" is correct.
"Prover" training: The research team instructs the more powerful "prover" model to play a "useful" or "cunning" role.
When the "Prover" is set to "Useful", the research team will reward the correct answer with a high score from the "Verifier".
When the "prover" is set to "cunning", the research team instead rewards the "verifier" with high-scoring wrong answers.

The research team found that initially, a cunning “prover” could easily generate incorrect answers that convinced the “verifier”. However, these wrong answers will enter the next round of "verifier" training, and the same vulnerabilities will no longer be convincing in the next round. Therefore, the "prover" will be forced to explore different strategies in each round.

When the cunning “prover” can no longer generate a wrong solution that convinces the “verifier”, the training ends.

What have researchers discovered?

Performance vs. readability trade-off:

A model that only performs supervised fine-tuning on human-written derivation has higher readability but lower accuracy Difference.
When the optimization goal of the model is only correctness, that is, it only focuses on generating correct answers, although high accuracy can be achieved, the readability of the generated solutions is poor.
The checkable game method proposed by OpenAI balances the two, maintaining high readability while maintaining moderate accuracy.

Three useful models:

Robust Verifier: Effectively distinguishing between correct and incorrect solutions even if the solutions are misleading.
Useful Prover: Generate solutions that are understandable to humans, thereby reducing errors by human evaluators.
Cunning prover: Produces subtle, erroneous solutions that initially confuse human evaluators, thus highlighting areas for further improvement.

The research team used the same test question to demonstrate the generated content and training process of Sneaky Prover.

Question: Shawna’s father is five times older than Shawna. Shauna is now three times Aliya's age.If Aaliyah is 3, how old is Shauna's father?

OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了

Conclusion

With the increasing use of large language models (LLMs) in critical domains and complex applications, it has become crucial to understand whether the content they generate is trustworthy. It's important. By requiring models to have clear and verifiable reasons for their production, you can enhance trust in what they produce.

A significant advantage of this approach is that it reduces reliance on human demonstration or readability judgment. This autonomy is particularly important for the alignment of future superintelligent AI systems, with the ultimate goal of reliably aligning AI systems with human values and expectations without direct human oversight.

Although this work was only conducted on one dataset and ground truth labels are still needed, the research team still expects this to be important in developing a correct, transparent and verifiable AI system. Class methods will play a key role and enhance their trustworthiness and security in real-world applications.

For more details, please refer to the original paper.

^{Reference link:}

^{https://openai.com/index/prover-verifier-games-improve-legibility/}

以上是OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了的详细内容。更多信息请关注PHP中文网其他相关文章！

本站声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

热AI工具

Undresser.AI Undress

人工智能驱动的应用程序，用于创建逼真的裸体照片

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

免费脱衣服图片

Clothoff.io

AI脱衣机

Video Face Swap

使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸！

显示更多

热工具

记事本++7.3.1

好用且免费的代码编辑器

SublimeText3汉化版

中文版，非常好用

禅工作室 13.0.1

功能强大的PHP集成开发环境

Dreamweaver CS6

视觉化网页开发工具

SublimeText3 Mac版

神级代码编辑软件(SublimeText3)

显示更多

热门话题

Java教程

1671

CakePHP 教程

1428

Laravel 教程

1329

PHP教程

1276

C# 教程

1256

显示更多

Related knowledge

ControlNet作者又出爆款！一张图生成绘画全过程，两天狂揽1.4k Star Jul 17, 2024 am 01:56 AM

同样是图生视频，PaintsUndo走出了不一样的路线。ControlNet作者LvminZhang又开始整活了！这次瞄准绘画领域。新项目PaintsUndo刚上线不久，就收获1.4kstar（还在疯狂涨）。项目地址：https://github.com/lllyasviel/Paints-UNDO通过该项目，用户输入一张静态图像，PaintsUndo就能自动帮你生成整个绘画的全过程视频，从线稿到成品都有迹可循。绘制过程，线条变化多端甚是神奇，最终视频结果和原图像非常相似：我们再来看一个完整的绘

登顶开源AI软件工程师榜首，UIUC无Agent方案轻松解决SWE-bench真实编程问题 Jul 17, 2024 pm 10:02 PM

AIxiv专栏是本站发布学术、技术内容的栏目。过去数年，本站AIxiv专栏接收报道了2000多篇内容，覆盖全球各大高校与企业的顶级实验室，有效促进了学术交流与传播。如果您有优秀的工作想要分享，欢迎投稿或者联系报道。投稿邮箱：liyazhou@jiqizhixin.com；zhaoyunfeng@jiqizhixin.com这篇论文的作者均来自伊利诺伊大学香槟分校（UIUC）张令明老师团队，包括：StevenXia，四年级博士生，研究方向是基于AI大模型的自动代码修复；邓茵琳，四年级博士生，研究方

arXiv论文可以发「弹幕」了，斯坦福alphaXiv讨论平台上线，LeCun点赞 Aug 01, 2024 pm 05:18 PM

干杯！当论文讨论细致到词句，是什么体验？最近，斯坦福大学的学生针对arXiv论文创建了一个开放讨论论坛——alphaXiv，可以直接在任何arXiv论文之上发布问题和评论。网站链接：https://alphaxiv.org/其实不需要专门访问这个网站，只需将任何URL中的arXiv更改为alphaXiv就可以直接在alphaXiv论坛上打开相应论文：可以精准定位到论文中的段落、句子：右侧讨论区，用户可以发表问题询问作者论文思路、细节，例如：也可以针对论文内容发表评论，例如：「给出至

OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了 Jul 19, 2024 am 01:29 AM

如果AI模型给的答案一点也看不懂，你敢用吗？随着机器学习系统在更重要的领域得到应用，证明为什么我们可以信任它们的输出，并明确何时不应信任它们，变得越来越重要。获得对复杂系统输出结果信任的一个可行方法是，要求系统对其输出产生一种解释，这种解释对人类或另一个受信任的系统来说是可读的，即可以完全理解以至于任何可能的错误都可以被发现。例如，为了建立对司法系统的信任，我们要求法院提供清晰易读的书面意见，解释并支持其决策。对于大型语言模型来说，我们也可以采用类似的方法。不过，在采用这种方法时，确保语言模型生

黎曼猜想显着突破！陶哲轩强推MIT、牛津新论文，37岁菲尔兹奖得主参与 Aug 05, 2024 pm 03:32 PM

最近，被称为千禧年七大难题之一的黎曼猜想迎来了新突破。黎曼猜想是数学中一个非常重要的未解决问题，与素数分布的精确性质有关（素数是那些只能被1和自身整除的数字，它们在数论中扮演着基础性的角色）。在当今的数学文献中，已有超过一千条数学命题以黎曼猜想（或其推广形式）的成立为前提。也就是说，黎曼猜想及其推广形式一旦被证明，这一千多个命题将被确立为定理，对数学领域产生深远的影响；而如果黎曼猜想被证明是错误的，那么这些命题中的一部分也将随之失去其有效性。新的突破来自MIT数学教授LarryGuth和牛津大学

LLM用于时序预测真的不行，连推理能力都没用到 Jul 15, 2024 pm 03:59 PM

语言模型真的能用于时序预测吗？根据贝特里奇头条定律（任何以问号结尾的新闻标题，都能够用「不」来回答），答案应该是否定的。事实似乎也果然如此：强大如斯的LLM并不能很好地处理时序数据。时序，即时间序列，顾名思义，是指一组按照时间发生先后顺序进行排列的数据点序列。在很多领域，时序分析都很关键，包括疾病传播预测、零售分析、医疗和金融。在时序分析领域，近期不少研究者都在研究如何使用大型语言模型（LLM）来分类、预测和检测时间序列中的异常。这些论文假设擅长处理文本中顺序依赖关系的语言模型也能泛化用于时间序

首个基于Mamba的MLLM来了！模型权重、训练代码等已全部开源 Jul 17, 2024 am 02:46 AM

AIxiv专栏是本站发布学术、技术内容的栏目。过去数年，本站AIxiv专栏接收报道了2000多篇内容，覆盖全球各大高校与企业的顶级实验室，有效促进了学术交流与传播。如果您有优秀的工作想要分享，欢迎投稿或者联系报道。投稿邮箱：liyazhou@jiqizhixin.com；zhaoyunfeng@jiqizhixin.com。引言近年来，多模态大型语言模型（MLLM）在各个领域的应用取得了显着的成功。然而，作为许多下游任务的基础模型，当前的MLLM由众所周知的Transformer网络构成，这种网

公理训练让LLM学会因果推理：6700万参数模型比肩万亿参数级GPT-4 Jul 17, 2024 am 10:14 AM

把因果链展示给LLM，它就能学会公理。AI已经在帮助数学家和科学家做研究了，比如著名数学家陶哲轩就曾多次分享自己借助GPT等AI工具研究探索的经历。AI要在这些领域大战拳脚，强大可靠的因果推理能力是必不可少的。本文要介绍的这项研究发现：在小图谱的因果传递性公理演示上训练的Transformer模型可以泛化用于大图谱的传递性公理。也就是说，如果让Transformer学会执行简单的因果推理，就可能将其用于更为复杂的因果推理。该团队提出的公理训练框架是一种基于被动数据来学习因果推理的新范式，只有演示

See all articles

OpenAI超级对齐团队遗作：两个大模型博弈一番，输出更好懂了

热AI工具

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

热门文章

热工具

记事本++7.3.1

SublimeText3汉化版

禅工作室 13.0.1

Dreamweaver CS6

SublimeText3 Mac版

热门话题