News on April 4th, OpenAI’s latest language model GPT-4 is not only able to generate various texts like humans , also able to design and execute tests to evaluate and improve their performance. This "reflection" technology has allowed GPT-4 to achieve significant improvements in many difficult tests, with test performance improved by 30%.
GPT-4 is the most advanced system launched by OpenAI after GPT, GPT-2 and GPT-3, and is currently the largest multi-modal model (can accept image and text input and output text). It leverages deep learning technology, using artificial neural networks to imitate human writing.
Researchers Noah Shinn and Ashwin Gopinath wrote in the paper: "We have developed a novel technology that allows AI agents to Simulate human self-reflection and evaluate one's own performance. When completing various tests, GPT-4 will add some extra steps, allowing it to design its own tests to check its own answers and identify errors and deficiencies. Then modify your solution based on your findings."
In the HumanEval coding test, GPT-4 used a self-reflection loop, and the accuracy increased from 67% to 88%
GPT-4 can be designed and executed to critique its own performance, and as shown in the AlfWorld test results, its performance can be greatly improved
Research The team used this technique to conduct several different performance tests on GPT-4. In the HumanEval test, GPT-4 needed to solve 164 never-before-seen Python programming problems. The original accuracy was 67%. After using reflection technology, the accuracy increased to 88%. In the Alfworld test, the AI needs to make decisions and solve multi-step tasks by performing a number of allowed operations in a variety of different interactive environments. After using reflection techniques, GPT-4's accuracy increased from 73% to 97%, with only 4 task failures. In the HotPotQA test, GPT-4 accessed Wikipedia and answered 100 questions that required parsing content and reasoning from multiple supporting documents. The original accuracy was 34%. After using reflection technology, the accuracy increased to 54%.
This research shows that solutions to AI problems sometimes rely on AI itself. IT House found that this is a bit like a generative adversarial network, which is a method for two AIs to improve each other's skills. For example, one AI tries to generate some pictures that look like real pictures, and the other AI tries to distinguish which ones are fake. Which ones are true. But in this case, GPT is both a writer and an editor, using self-reflection to improve the quality of his or her output.
The above is the detailed content of GPT-4's ability greatly increased after 'self-reflection', and test performance increased by 30%. For more information, please follow other related articles on the PHP Chinese website!