In 1956, at a seminar held at Dartmouth College, the concept of artificial intelligence was formally proposed.
The term has since challenged psychologists, philosophers and computer scientists because it is so difficult to define. In 1994, 52 psychologists published a joint paper trying to capture its essence.
As time went by, researchers began to shift their attention to AI systems in specific fields, such as the 2016 AlphaGo challenge to the Korean champion chess player and a great victory. Then, in the late 1990s and early 2000s, researchers were no longer satisfied with specialized AI, so calls for the development of more general artificial intelligence systems grew. Along with this, the term Artificial General Intelligence (AGI) began to gain popularity in the early 2000s.
In recent times, as you have seen, large language models (LLM) have come into the spotlight. These neural networks are based on the Transformer architecture and trained on large text data sets. In particular, OpenAI's latest release of GPT-4 demonstrates the versatility of large-scale language models and is proficient in mathematics, writing, law, medicine and other fields.
We can’t help but ask, is GPT-4 an important step towards AGI?
The answer given by Microsoft is yes. In a recently released paper, they elaborated on this point of view. This article provides a comprehensive evaluation of GPT-4. Microsoft believes that "given the breadth and depth of GPT-4's capabilities, we believe it should reasonably be considered an early (but still incomplete) version of an artificial general intelligence (AGI) system."
Microsoft also stated, "The main goal of this article is to explore the capabilities and limitations of GPT-4. We believe that the intelligence of GPT-4 marks a true paradigm shift in computer science and other fields."
Paper address: https://arxiv.org/pdf/2303.12712.pdf
Interesting Unfortunately, this popular paper was also found to have a lot of deletions, so someone found the unabridged version of the paper.
In the uncut version, this blogger also revealed a lot of hidden details. For example, the internal name of GPT-4 is DV-3, which is actually the hidden part of the paper. The third author was later deleted; these Microsoft researchers did not seem to know much about the technical details of GPT-4. In addition, the blogger also revealed that the part about toxic content was deleted when this paper was published (to prevent negative effects on OpenAI?).
We have pasted the blogger’s Twitter thread below for those who are interested to check it out.
##Twitter thread: https://twitter.com/DV2559106965076/status/1638769434763608064
Back to the article itself.
According to the article, AGI is the specific ability to reason, plan, solve problems, think abstractly, understand complex ideas, learn quickly, and learn from experience. Starting from these capabilities, the paper conducts interesting experiments and evaluations.
The paper is divided into 10 chapters: Chapter 1 is the general part; Chapter 2 introduces multi-modality, mainly related to visual generation content; Chapter 3 code, generated according to instructions Code, understanding existing code; Chapter 4 Mathematical Ability; Chapter 5 Interaction with the World; Chapter 6 Interaction with Humans; Chapter 7 Discrimination; Chapter 8 GPT-4 Limitations; Chapter 9 Social Impact; Chapter 10: Future Directions and Conclusions.
Let’s use specific examples to see if GPT-4 has really entered the AGI era.
Multimodal and interdisciplinary compositionTo test the model’s ability to combine art with programming, the study asked GPT-4 to write a piece of code in JavaScript to Generate Kandinsky-style random images. The first picture below was created by Wassily Kandinsky, and the second and third pictures were generated by GPT-4 and ChatGPT respectively:
The following is the GPT-4 code implementation process:
Perform visual concept understanding: In this drawing task, input prompts to let the model combine the shapes of the letters Y, O, and H to draw a person. In fact, in the training process of GPT-4, there is no knowledge about the shape of letters. It can only vaguely learn that letters are related to some specific shapes from relevant training data. The results show that the results generated by GPT-4 are not bad:
For sketch generation: GPT-4 can also be combined with Stable Diffusion. The picture below is a screenshot of 3D city modeling. The input prompt has a river flowing from left to right, a desert with pyramids built next to the river, and 4 buttons at the bottom of the screen, the colors are green, blue, brown and red. The following is the generated result:
## You can also ask GPT-4 to generate and modify tunes using ABC notation:
Programming abilityGPT-4 has very strong programming ability, including writing code according to instructions and understanding existing code. The study specifically tested GPT-4's programming capabilities.
Code writing
Figure 3.1 below is an example of letting GPT-4 write python functions. This study Use LeetCode to determine whether the code is correct online.
The study then let GPT-4 visualize the accuracy data on LeetCode in Table 2 above As a graph, the results are shown in Figure 3.2 below.
##Front-end/Game Development## As shown in Figure 3.3 below, the study asked GPT-4 to write a 3D game in HTML using JavaScript. GPT-4 generated a game that met all requirements with zero samples.
Deep Learning Programming
Writing code for deep learning requires math , statistical knowledge, and familiarity with frameworks and libraries such as PyTorch, TensorFlow, Keras, etc. As shown in Figure 3.4 below, researchers require GPT-4 and ChatGPT to write custom optimizer modules, a task that is also challenging for human deep learning experts. Researchers provide natural language descriptions for GPT-4 and ChatGPT, which include a series of important operations, such as applying SVD and so on.
In addition, the study also tested GPT-4’s ability to convert code into LaTex formulas, and the results are shown in Figure 3.5 below.
In terms of understanding code, this research attempts to let GPT-4 and ChatGPT "understand" a C/C program and predict the output results of the program. The performance of the two is as follows:
The study then asked GPT-4 to interpret a piece of Python code:
Also There is a piece of pseudo code to explain:
For a long time, the mathematical ability of large language models has not seemed to be very good. So how does GPT-4 perform in this regard? After a series of evaluations in this article, the results show that GPT-4 has made a qualitative leap in mathematics compared to previous models, but it is still far from the expert level and does not have the ability to do mathematical research.
In comparison with ChatGPT, GPT-4 successfully generated the solution, while ChatGPT generated the wrong answer:
On the AP issue, GPT-4 vs ChatGPT comparison results. GPT-4 used the correct approach, but a computational error led to the wrong final answer, while ChatGPT produced an incoherent argument.
In addition, this article also tests GPT-4’s ability to use mathematical thinking and technology to solve real-world problems: The figure below shows how GPT-4 Successfully constructing a reasonable mathematical model for a complex system that requires extensive interdisciplinary knowledge, ChatGPT fails to make meaningful progress.
Since the paper is 154 pages long, this article only displays a large number of evaluation results. For more information, readers can refer to the original paper.
Finally, attach the table of contents:
# #
The above is the detailed content of After completing the complete evaluation of GPT-4, Microsoft's hot paper said that the first version of AGI is coming soon. For more information, please follow other related articles on the PHP Chinese website!