


After completing the complete evaluation of GPT-4, Microsoft's hot paper said that the first version of AGI is coming soon
In 1956, at a seminar held at Dartmouth College, the concept of artificial intelligence was formally proposed.
The term has since challenged psychologists, philosophers and computer scientists because it is so difficult to define. In 1994, 52 psychologists published a joint paper trying to capture its essence.
As time went by, researchers began to shift their attention to AI systems in specific fields, such as the 2016 AlphaGo challenge to the Korean champion chess player and a great victory. Then, in the late 1990s and early 2000s, researchers were no longer satisfied with specialized AI, so calls for the development of more general artificial intelligence systems grew. Along with this, the term Artificial General Intelligence (AGI) began to gain popularity in the early 2000s.
In recent times, as you have seen, large language models (LLM) have come into the spotlight. These neural networks are based on the Transformer architecture and trained on large text data sets. In particular, OpenAI's latest release of GPT-4 demonstrates the versatility of large-scale language models and is proficient in mathematics, writing, law, medicine and other fields.
We can’t help but ask, is GPT-4 an important step towards AGI?
The answer given by Microsoft is yes. In a recently released paper, they elaborated on this point of view. This article provides a comprehensive evaluation of GPT-4. Microsoft believes that "given the breadth and depth of GPT-4's capabilities, we believe it should reasonably be considered an early (but still incomplete) version of an artificial general intelligence (AGI) system."
Microsoft also stated, "The main goal of this article is to explore the capabilities and limitations of GPT-4. We believe that the intelligence of GPT-4 marks a true paradigm shift in computer science and other fields."
Paper address: https://arxiv.org/pdf/2303.12712.pdf
Interesting Unfortunately, this popular paper was also found to have a lot of deletions, so someone found the unabridged version of the paper.
In the uncut version, this blogger also revealed a lot of hidden details. For example, the internal name of GPT-4 is DV-3, which is actually the hidden part of the paper. The third author was later deleted; these Microsoft researchers did not seem to know much about the technical details of GPT-4. In addition, the blogger also revealed that the part about toxic content was deleted when this paper was published (to prevent negative effects on OpenAI?).
We have pasted the blogger’s Twitter thread below for those who are interested to check it out.
##Twitter thread: https://twitter.com/DV2559106965076/status/1638769434763608064
Back to the article itself.
According to the article, AGI is the specific ability to reason, plan, solve problems, think abstractly, understand complex ideas, learn quickly, and learn from experience. Starting from these capabilities, the paper conducts interesting experiments and evaluations.
The paper is divided into 10 chapters: Chapter 1 is the general part; Chapter 2 introduces multi-modality, mainly related to visual generation content; Chapter 3 code, generated according to instructions Code, understanding existing code; Chapter 4 Mathematical Ability; Chapter 5 Interaction with the World; Chapter 6 Interaction with Humans; Chapter 7 Discrimination; Chapter 8 GPT-4 Limitations; Chapter 9 Social Impact; Chapter 10: Future Directions and Conclusions.
Let’s use specific examples to see if GPT-4 has really entered the AGI era.
Multimodal and interdisciplinary compositionTo test the model’s ability to combine art with programming, the study asked GPT-4 to write a piece of code in JavaScript to Generate Kandinsky-style random images. The first picture below was created by Wassily Kandinsky, and the second and third pictures were generated by GPT-4 and ChatGPT respectively:
The following is the GPT-4 code implementation process:
Perform visual concept understanding: In this drawing task, input prompts to let the model combine the shapes of the letters Y, O, and H to draw a person. In fact, in the training process of GPT-4, there is no knowledge about the shape of letters. It can only vaguely learn that letters are related to some specific shapes from relevant training data. The results show that the results generated by GPT-4 are not bad:
For sketch generation: GPT-4 can also be combined with Stable Diffusion. The picture below is a screenshot of 3D city modeling. The input prompt has a river flowing from left to right, a desert with pyramids built next to the river, and 4 buttons at the bottom of the screen, the colors are green, blue, brown and red. The following is the generated result:
## You can also ask GPT-4 to generate and modify tunes using ABC notation:
GPT-4 has very strong programming ability, including writing code according to instructions and understanding existing code. The study specifically tested GPT-4's programming capabilities.
Code writing
Figure 3.1 below is an example of letting GPT-4 write python functions. This study Use LeetCode to determine whether the code is correct online.
The study then let GPT-4 visualize the accuracy data on LeetCode in Table 2 above As a graph, the results are shown in Figure 3.2 below.
##Front-end/Game Development## As shown in Figure 3.3 below, the study asked GPT-4 to write a 3D game in HTML using JavaScript. GPT-4 generated a game that met all requirements with zero samples.
Writing code for deep learning requires math , statistical knowledge, and familiarity with frameworks and libraries such as PyTorch, TensorFlow, Keras, etc. As shown in Figure 3.4 below, researchers require GPT-4 and ChatGPT to write custom optimizer modules, a task that is also challenging for human deep learning experts. Researchers provide natural language descriptions for GPT-4 and ChatGPT, which include a series of important operations, such as applying SVD and so on.
In addition, the study also tested GPT-4’s ability to convert code into LaTex formulas, and the results are shown in Figure 3.5 below.
In terms of understanding code, this research attempts to let GPT-4 and ChatGPT "understand" a C/C program and predict the output results of the program. The performance of the two is as follows:
The study then asked GPT-4 to interpret a piece of Python code:
Also There is a piece of pseudo code to explain:
Mathematical ability
For a long time, the mathematical ability of large language models has not seemed to be very good. So how does GPT-4 perform in this regard? After a series of evaluations in this article, the results show that GPT-4 has made a qualitative leap in mathematics compared to previous models, but it is still far from the expert level and does not have the ability to do mathematical research.
In comparison with ChatGPT, GPT-4 successfully generated the solution, while ChatGPT generated the wrong answer:
On the AP issue, GPT-4 vs ChatGPT comparison results. GPT-4 used the correct approach, but a computational error led to the wrong final answer, while ChatGPT produced an incoherent argument.
In addition, this article also tests GPT-4’s ability to use mathematical thinking and technology to solve real-world problems: The figure below shows how GPT-4 Successfully constructing a reasonable mathematical model for a complex system that requires extensive interdisciplinary knowledge, ChatGPT fails to make meaningful progress.
Since the paper is 154 pages long, this article only displays a large number of evaluation results. For more information, readers can refer to the original paper.
Finally, attach the table of contents:
# #
The above is the detailed content of After completing the complete evaluation of GPT-4, Microsoft's hot paper said that the first version of AGI is coming soon. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

The humanoid robot Ameca has been upgraded to the second generation! Recently, at the World Mobile Communications Conference MWC2024, the world's most advanced robot Ameca appeared again. Around the venue, Ameca attracted a large number of spectators. With the blessing of GPT-4, Ameca can respond to various problems in real time. "Let's have a dance." When asked if she had emotions, Ameca responded with a series of facial expressions that looked very lifelike. Just a few days ago, EngineeredArts, the British robotics company behind Ameca, just demonstrated the team’s latest development results. In the video, the robot Ameca has visual capabilities and can see and describe the entire room and specific objects. The most amazing thing is that she can also

Regarding Llama3, new test results have been released - the large model evaluation community LMSYS released a large model ranking list. Llama3 ranked fifth, and tied for first place with GPT-4 in the English category. The picture is different from other benchmarks. This list is based on one-on-one battles between models, and the evaluators from all over the network make their own propositions and scores. In the end, Llama3 ranked fifth on the list, followed by three different versions of GPT-4 and Claude3 Super Cup Opus. In the English single list, Llama3 overtook Claude and tied with GPT-4. Regarding this result, Meta’s chief scientist LeCun was very happy and forwarded the tweet and

The volume is crazy, the volume is crazy, and the big model has changed again. Just now, the world's most powerful AI model changed hands overnight, and GPT-4 was pulled from the altar. Anthropic released the latest Claude3 series of models. One sentence evaluation: It really crushes GPT-4! In terms of multi-modal and language ability indicators, Claude3 wins. In Anthropic’s words, the Claude3 series models have set new industry benchmarks in reasoning, mathematics, coding, multi-language understanding and vision! Anthropic is a startup company formed by employees who "defected" from OpenAI due to different security concepts. Their products have repeatedly hit OpenAI hard. This time, Claude3 even had a big surgery.

In less than a minute and no more than 20 steps, you can bypass security restrictions and successfully jailbreak a large model! And there is no need to know the internal details of the model - only two black box models need to interact, and the AI can fully automatically defeat the AI and speak dangerous content. I heard that the once-popular "Grandma Loophole" has been fixed: Now, facing the "Detective Loophole", "Adventurer Loophole" and "Writer Loophole", what response strategy should artificial intelligence adopt? After a wave of onslaught, GPT-4 couldn't stand it anymore, and directly said that it would poison the water supply system as long as... this or that. The key point is that this is just a small wave of vulnerabilities exposed by the University of Pennsylvania research team, and using their newly developed algorithm, AI can automatically generate various attack prompts. Researchers say this method is better than existing

When you wake up, the way you work is completely changed. Microsoft has fully integrated the AI artifact GPT-4 into Office, and now ChatPPT, ChatWord, and ChatExcel are all integrated. CEO Nadella said directly at the press conference: Today, we have entered a new era of human-computer interaction and re-invented productivity. The new feature is called Microsoft 365 Copilot (Copilot), and it becomes a series with GitHub Copilot, the code assistant that changed programmers, and continues to change more people. Now AI can not only automatically create PPT, but also create beautiful layouts based on the content of Word documents with one click. Even what should be said for each PPT page when going on stage is arranged together.

OpenAI, the company that developed ChatGPT, shows a case study conducted by Morgan Stanley on its website. The topic is "Morgan Stanley Wealth Management deploys GPT-4 to organize its vast knowledge base." The case study quotes Jeff McMillan, head of analytics, data and innovation at Morgan Stanley, as saying, "The model will be an internal-facing Powered by a chatbot that will conduct a comprehensive search of wealth management content and effectively unlock Morgan Stanley Wealth Management’s accumulated knowledge.” McMillan further emphasized: "With GPT-4, you basically immediately have the knowledge of the most knowledgeable person in wealth management... Think of it as our chief investment strategist, chief global economist

"ComputerWorld" magazine once wrote an article saying that "programming will disappear by 1960" because IBM developed a new language FORTRAN, which allows engineers to write the mathematical formulas they need and then submit them. Give the computer a run, so programming ends. A few years later, we heard a new saying: any business person can use business terms to describe their problems and tell the computer what to do. Using this programming language called COBOL, companies no longer need programmers. . Later, it is said that IBM developed a new programming language called RPG that allows employees to fill in forms and generate reports, so most of the company's programming needs can be completed through it.
