Table of Contents
Mathematical ability
Home Technology peripherals AI After completing the complete evaluation of GPT-4, Microsoft's hot paper said that the first version of AGI is coming soon

After completing the complete evaluation of GPT-4, Microsoft's hot paper said that the first version of AGI is coming soon

May 01, 2023 am 09:25 AM
gpt-4 Evaluation

In 1956, at a seminar held at Dartmouth College, the concept of artificial intelligence was formally proposed.

The term has since challenged psychologists, philosophers and computer scientists because it is so difficult to define. In 1994, 52 psychologists published a joint paper trying to capture its essence.

As time went by, researchers began to shift their attention to AI systems in specific fields, such as the 2016 AlphaGo challenge to the Korean champion chess player and a great victory. Then, in the late 1990s and early 2000s, researchers were no longer satisfied with specialized AI, so calls for the development of more general artificial intelligence systems grew. Along with this, the term Artificial General Intelligence (AGI) began to gain popularity in the early 2000s.

In recent times, as you have seen, large language models (LLM) have come into the spotlight. These neural networks are based on the Transformer architecture and trained on large text data sets. In particular, OpenAI's latest release of GPT-4 demonstrates the versatility of large-scale language models and is proficient in mathematics, writing, law, medicine and other fields.

We can’t help but ask, is GPT-4 an important step towards AGI?

The answer given by Microsoft is yes. In a recently released paper, they elaborated on this point of view. This article provides a comprehensive evaluation of GPT-4. Microsoft believes that "given the breadth and depth of GPT-4's capabilities, we believe it should reasonably be considered an early (but still incomplete) version of an artificial general intelligence (AGI) system."

Microsoft also stated, "The main goal of this article is to explore the capabilities and limitations of GPT-4. We believe that the intelligence of GPT-4 marks a true paradigm shift in computer science and other fields."

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

Paper address: https://arxiv.org/pdf/2303.12712.pdf

Interesting Unfortunately, this popular paper was also found to have a lot of deletions, so someone found the unabridged version of the paper.

In the uncut version, this blogger also revealed a lot of hidden details. For example, the internal name of GPT-4 is DV-3, which is actually the hidden part of the paper. The third author was later deleted; these Microsoft researchers did not seem to know much about the technical details of GPT-4. In addition, the blogger also revealed that the part about toxic content was deleted when this paper was published (to prevent negative effects on OpenAI?).

We have pasted the blogger’s Twitter thread below for those who are interested to check it out.

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

##Twitter thread: https://twitter.com/DV2559106965076/status/1638769434763608064

Back to the article itself.

According to the article, AGI is the specific ability to reason, plan, solve problems, think abstractly, understand complex ideas, learn quickly, and learn from experience. Starting from these capabilities, the paper conducts interesting experiments and evaluations.

The paper is divided into 10 chapters: Chapter 1 is the general part; Chapter 2 introduces multi-modality, mainly related to visual generation content; Chapter 3 code, generated according to instructions Code, understanding existing code; Chapter 4 Mathematical Ability; Chapter 5 Interaction with the World; Chapter 6 Interaction with Humans; Chapter 7 Discrimination; Chapter 8 GPT-4 Limitations; Chapter 9 Social Impact; Chapter 10: Future Directions and Conclusions.

Let’s use specific examples to see if GPT-4 has really entered the AGI era.

Multimodal and interdisciplinary composition

To test the model’s ability to combine art with programming, the study asked GPT-4 to write a piece of code in JavaScript to Generate Kandinsky-style random images. The first picture below was created by Wassily Kandinsky, and the second and third pictures were generated by GPT-4 and ChatGPT respectively:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

The following is the GPT-4 code implementation process:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

Perform visual concept understanding: In this drawing task, input prompts to let the model combine the shapes of the letters Y, O, and H to draw a person. In fact, in the training process of GPT-4, there is no knowledge about the shape of letters. It can only vaguely learn that letters are related to some specific shapes from relevant training data. The results show that the results generated by GPT-4 are not bad:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

For sketch generation: GPT-4 can also be combined with Stable Diffusion. The picture below is a screenshot of 3D city modeling. The input prompt has a river flowing from left to right, a desert with pyramids built next to the river, and 4 buttons at the bottom of the screen, the colors are green, blue, brown and red. The following is the generated result:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

## You can also ask GPT-4 to generate and modify tunes using ABC notation:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

Programming ability

GPT-4 has very strong programming ability, including writing code according to instructions and understanding existing code. The study specifically tested GPT-4's programming capabilities.

Code writing

Figure 3.1 below is an example of letting GPT-4 write python functions. This study Use LeetCode to determine whether the code is correct online.

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

The study then let GPT-4 visualize the accuracy data on LeetCode in Table 2 above As a graph, the results are shown in Figure 3.2 below.

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

##Front-end/Game Development## As shown in Figure 3.3 below, the study asked GPT-4 to write a 3D game in HTML using JavaScript. GPT-4 generated a game that met all requirements with zero samples.

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

Deep Learning Programming

Writing code for deep learning requires math , statistical knowledge, and familiarity with frameworks and libraries such as PyTorch, TensorFlow, Keras, etc. As shown in Figure 3.4 below, researchers require GPT-4 and ChatGPT to write custom optimizer modules, a task that is also challenging for human deep learning experts. Researchers provide natural language descriptions for GPT-4 and ChatGPT, which include a series of important operations, such as applying SVD and so on.

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soonIn addition, the study also tested GPT-4’s ability to convert code into LaTex formulas, and the results are shown in Figure 3.5 below.

In terms of understanding code, this research attempts to let GPT-4 and ChatGPT "understand" a C/C program and predict the output results of the program. The performance of the two is as follows:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

The study then asked GPT-4 to interpret a piece of Python code:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

Also There is a piece of pseudo code to explain:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

Mathematical ability

For a long time, the mathematical ability of large language models has not seemed to be very good. So how does GPT-4 perform in this regard? After a series of evaluations in this article, the results show that GPT-4 has made a qualitative leap in mathematics compared to previous models, but it is still far from the expert level and does not have the ability to do mathematical research.

In comparison with ChatGPT, GPT-4 successfully generated the solution, while ChatGPT generated the wrong answer:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

On the AP issue, GPT-4 vs ChatGPT comparison results. GPT-4 used the correct approach, but a computational error led to the wrong final answer, while ChatGPT produced an incoherent argument.

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

In addition, this article also tests GPT-4’s ability to use mathematical thinking and technology to solve real-world problems: The figure below shows how GPT-4 Successfully constructing a reasonable mathematical model for a complex system that requires extensive interdisciplinary knowledge, ChatGPT fails to make meaningful progress.

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

Since the paper is 154 pages long, this article only displays a large number of evaluation results. For more information, readers can refer to the original paper.

Finally, attach the table of contents:

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon

After completing the complete evaluation of GPT-4, Microsofts hot paper said that the first version of AGI is coming soon# #

The above is the detailed content of After completing the complete evaluation of GPT-4, Microsoft's hot paper said that the first version of AGI is coming soon. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

The second generation Ameca is here! He can communicate with the audience fluently, his facial expressions are more realistic, and he can speak dozens of languages. The second generation Ameca is here! He can communicate with the audience fluently, his facial expressions are more realistic, and he can speak dozens of languages. Mar 04, 2024 am 09:10 AM

The humanoid robot Ameca has been upgraded to the second generation! Recently, at the World Mobile Communications Conference MWC2024, the world's most advanced robot Ameca appeared again. Around the venue, Ameca attracted a large number of spectators. With the blessing of GPT-4, Ameca can respond to various problems in real time. "Let's have a dance." When asked if she had emotions, Ameca responded with a series of facial expressions that looked very lifelike. Just a few days ago, EngineeredArts, the British robotics company behind Ameca, just demonstrated the team’s latest development results. In the video, the robot Ameca has visual capabilities and can see and describe the entire room and specific objects. The most amazing thing is that she can also

750,000 rounds of one-on-one battle between large models, GPT-4 won the championship, and Llama 3 ranked fifth 750,000 rounds of one-on-one battle between large models, GPT-4 won the championship, and Llama 3 ranked fifth Apr 23, 2024 pm 03:28 PM

Regarding Llama3, new test results have been released - the large model evaluation community LMSYS released a large model ranking list. Llama3 ranked fifth, and tied for first place with GPT-4 in the English category. The picture is different from other benchmarks. This list is based on one-on-one battles between models, and the evaluators from all over the network make their own propositions and scores. In the end, Llama3 ranked fifth on the list, followed by three different versions of GPT-4 and Claude3 Super Cup Opus. In the English single list, Llama3 overtook Claude and tied with GPT-4. Regarding this result, Meta’s chief scientist LeCun was very happy and forwarded the tweet and

The world's most powerful model changed hands overnight, marking the end of the GPT-4 era! Claude 3 sniped GPT-5 in advance, and read a 10,000-word paper in 3 seconds. His understanding is close to that of humans. The world's most powerful model changed hands overnight, marking the end of the GPT-4 era! Claude 3 sniped GPT-5 in advance, and read a 10,000-word paper in 3 seconds. His understanding is close to that of humans. Mar 06, 2024 pm 12:58 PM

The volume is crazy, the volume is crazy, and the big model has changed again. Just now, the world's most powerful AI model changed hands overnight, and GPT-4 was pulled from the altar. Anthropic released the latest Claude3 series of models. One sentence evaluation: It really crushes GPT-4! In terms of multi-modal and language ability indicators, Claude3 wins. In Anthropic’s words, the Claude3 series models have set new industry benchmarks in reasoning, mathematics, coding, multi-language understanding and vision! Anthropic is a startup company formed by employees who "defected" from OpenAI due to different security concepts. Their products have repeatedly hit OpenAI hard. This time, Claude3 even had a big surgery.

Jailbreak any large model in 20 steps! More 'grandma loopholes' are discovered automatically Jailbreak any large model in 20 steps! More 'grandma loopholes' are discovered automatically Nov 05, 2023 pm 08:13 PM

In less than a minute and no more than 20 steps, you can bypass security restrictions and successfully jailbreak a large model! And there is no need to know the internal details of the model - only two black box models need to interact, and the AI ​​can fully automatically defeat the AI ​​and speak dangerous content. I heard that the once-popular "Grandma Loophole" has been fixed: Now, facing the "Detective Loophole", "Adventurer Loophole" and "Writer Loophole", what response strategy should artificial intelligence adopt? After a wave of onslaught, GPT-4 couldn't stand it anymore, and directly said that it would poison the water supply system as long as... this or that. The key point is that this is just a small wave of vulnerabilities exposed by the University of Pennsylvania research team, and using their newly developed algorithm, AI can automatically generate various attack prompts. Researchers say this method is better than existing

GPT-4 is connected to the Office family bucket! From Excel to PPT, you can do it with your mouth, Microsoft: Reinvent productivity GPT-4 is connected to the Office family bucket! From Excel to PPT, you can do it with your mouth, Microsoft: Reinvent productivity Apr 12, 2023 pm 02:40 PM

When you wake up, the way you work is completely changed. Microsoft has fully integrated the AI ​​artifact GPT-4 into Office, and now ChatPPT, ChatWord, and ChatExcel are all integrated. CEO Nadella said directly at the press conference: Today, we have entered a new era of human-computer interaction and re-invented productivity. The new feature is called Microsoft 365 Copilot (Copilot), and it becomes a series with GitHub Copilot, the code assistant that changed programmers, and continues to change more people. Now AI can not only automatically create PPT, but also create beautiful layouts based on the content of Word documents with one click. Even what should be said for each PPT page when going on stage is arranged together.

What ChatGPT and generative AI mean in digital transformation What ChatGPT and generative AI mean in digital transformation May 15, 2023 am 10:19 AM

OpenAI, the company that developed ChatGPT, shows a case study conducted by Morgan Stanley on its website. The topic is "Morgan Stanley Wealth Management deploys GPT-4 to organize its vast knowledge base." The case study quotes Jeff McMillan, head of analytics, data and innovation at Morgan Stanley, as saying, "The model will be an internal-facing Powered by a chatbot that will conduct a comprehensive search of wealth management content and effectively unlock Morgan Stanley Wealth Management’s accumulated knowledge.” McMillan further emphasized: "With GPT-4, you basically immediately have the knowledge of the most knowledgeable person in wealth management... Think of it as our chief investment strategist, chief global economist

Do you know that programmers will be in decline in a few years? Do you know that programmers will be in decline in a few years? Nov 08, 2023 am 11:17 AM

"ComputerWorld" magazine once wrote an article saying that "programming will disappear by 1960" because IBM developed a new language FORTRAN, which allows engineers to write the mathematical formulas they need and then submit them. Give the computer a run, so programming ends. A few years later, we heard a new saying: any business person can use business terms to describe their problems and tell the computer what to do. Using this programming language called COBOL, companies no longer need programmers. . Later, it is said that IBM developed a new programming language called RPG that allows employees to fill in forms and generate reports, so most of the company's programming needs can be completed through it.

See all articles