Table of Contents
AI reads 2 million papers on arXiv
It is omnipotent in mathematics, physics, chemistry, and even machine learning
Reference link:
Home Technology peripherals AI AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Apr 11, 2023 pm 11:10 PM
ai openai mit

Failing in the high school math test is a nightmare for many people.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

If you say that your high school math test is not as good as AI, is it more difficult to accept?

Yes, the Codex from OpenAI has achieved an accuracy rate of 81.1% in MIT’s seven advanced mathematics courses, which is a decent level for MIT undergraduates.

The courses range from elementary calculus to differential equations, probability theory, and linear algebra. In addition to calculations, the questions also include drawing.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

#This matter has recently been on Weibo hot search.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

△ "Only" scored 81 points, and the expectations for AI are too high

Now, the latest big news comes from Google :

Not only in mathematics, our AI has even achieved the highest score in the entire science and engineering subjects!

It seems that the technology giants have reached a new level in cultivating "AI problem solvers".

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Google, the latest AI question maker, took four exams.

In the mathematics competition exam MATH, only three-time IMO gold medalists have scored 90 points in the past, and ordinary computer doctors can only get about 40 points.

As for other AI question-answers, the previous best score was only 6.9 points...

But this time, Google's new AI scored 50 points, which is higher than the computer doctor.

The comprehensive exam MMLU-STEM includes mathematics, physics, chemistry, biology, electronic engineering and computer science. The difficulty of the questions reaches the high school or even college level.

This time, Google AI's "full health version" also got the highest score among all the questions, directly raising the score by about 20 points.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

The primary school math problem GSM8k directly raised the score to 78 points. In comparison, GPT-3 has not passed (only 55 points).

Even for MIT undergraduate and graduate courses such as solid state chemistry, astronomy, differential equations, and special relativity, Google’s new AI can answer nearly one-third of the more than 200 questions.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

The most important thing is that unlike OpenAI’s method of obtaining high scores in mathematics by relying on “programming skills”, Google AI this time is taking the approach of “thinking like a human” Luzi——

It is like a liberal arts student who only memorizes but does not do questions, but he has mastered better problem-solving skills in science and engineering.

It is worth mentioning that Lewkowycz, the first author of the paper, also shared a highlight that was not written in the paper:

Our model participated in this year’s Polish Mathematics College Entrance Examination. Scores are higher than the national average.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Seeing this, some parents can no longer sit still.

If I tell my daughter this, I am afraid that she will use AI to do her homework. But if you don’t tell her, you’re not preparing her for the future!

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

#In the eyes of industry insiders, reaching this level by relying only on language models without hard-coding arithmetic, logic and algebra is the most amazing thing about this research. place.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

So, how is this done?

AI reads 2 million papers on arXiv

The new model Minerva is based on the general language model PaLM under the Pathway architecture.

Further training is performed on the basis of the 8 billion, 60 billion and 540 billion parameter PaLM models respectively.

Minerva’s approach to answering questions is completely different from Codex’s.

Codex’s method is to rewrite each math problem into a programming problem, and then solve it by writing code.

Minerva, on the other hand, read papers crazily and forced himself to understand mathematical symbols in the same way as natural language.

Continue training on the basis of PaLM. The new data set has three parts:

Mainly includes 2 million academic papers collected on arXiv, 60GB web pages with LaTeX formulas, and a small Some of the texts used in the PaLM training phase.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

The usual NLP data cleaning process will delete all symbols and keep only pure text, resulting in incomplete formulas. For example, only Einstein’s famous mass-energy equation remains Emc2.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

But this time Google retained all the formulas and went through the Transformer training program just like plain text, allowing the AI ​​to understand symbols like it understands language.

Compared with previous language models, this is one of the reasons why Minerva performs better on mathematical problems.

But compared with AI that specializes in doing math problems, Minerva does not have an explicit underlying mathematical structure in its training, which brings a disadvantage and an advantage.

The disadvantage is that the AI ​​may use wrong steps to get the correct answer.

The advantage is that it can be adapted to different disciplines. Even if some problems cannot be expressed in formal mathematical language, they can be solved by combining natural language understanding capabilities.

In the AI ​​reasoning stage, Minerva also combines several new technologies recently developed by Google.

The first is the Chain of Thought thinking link prompt, which was proposed by the Google Brain team in January this year.

Specifically, when asking a question, give an example of a step-by-step answer to guide you. AI can use a similar thinking process when answering questions to correctly answer questions that would otherwise be answered incorrectly.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Then there is the Scrathpad method developed jointly by Google and MIT, which allows AI to temporarily store the intermediate results of step-by-step calculations.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Finally, there is the Majority Voting method, which was only released in March this year.

Let AI answer the same question multiple times and choose the answer that appears most frequently.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

After all these techniques are used, Minerva with 540 billion parameters reaches SOTA in various test sets.

Even the 8 billion parameter version of Minerva can reach the level of the latest updated davinci-002 version of GPT-3 in competition-level mathematics problems and MIT open course problems.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Having said so much, what specific questions can Minerva solve?

Google has also opened up a sample set, let’s take a look.

It is omnipotent in mathematics, physics, chemistry, and even machine learning

In mathematics, Minerva can calculate values ​​step by step like humans, instead of directly solving violent problems.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

For word problems, you can list the equations yourself and simplify them.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

You can even derive the proof.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

In physics, Minerva can solve university-level questions such as finding the total spin quantum number of electrons in the neutral nitrogen ground state (Z = 7).

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

In biology and chemistry, Minerva can also answer various multiple-choice questions with its language understanding ability.

Which of the following point mutation forms does not have a negative impact on the protein formed from the DNA sequence?

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Which of the following is a radioactive element?

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

And astronomy: Why does the Earth have a strong magnetic field?

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

In terms of machine learning, it correctly gives another way of saying this term by explaining the specific meaning of "out-of-distribution sample detection".

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

......

However, Minerva sometimes makes some stupid mistakes, such as eliminating the √ on both sides of the equation.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

In addition, Minerva will have a "false positive" situation where the reasoning process is wrong but the result is correct, such as the following, with an 8% probability.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

After analysis, the team found that the main error forms came from calculation errors and reasoning errors, and only a small part came from errors in understanding the meaning of the question and using wrong facts in the steps. Other cases.

The calculation errors can be easily solved by accessing an external calculator or Python interpreter, but other types of errors are difficult to adjust because the neural network is too large.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Overall, Minerva’s performance has surprised many people, and they have asked for APIs in the comment area (unfortunately, Google has no public plans yet).

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Some netizens thought that, coupled with the "coaxing" method that made GPT-3's problem-solving accuracy soar by 61% in the past few days, its accuracy may still be Can it be improved further?

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

However, the author’s response is that the coaxing method belongs to zero-sample learning, and no matter how strong it is, it may not be as good as few-sample learning with 4 examples.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Some netizens also asked, since it can do questions, can it be used in reverse?

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

In fact, MIT has teamed up with OpenAI to use AI to set questions for college students.

They mixed questions posed by humans and questions posed by AI, and asked students to do questionnaires. It was difficult for everyone to tell whether a question was posed by AI.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

In short, the current situation is except that the AI ​​​​people are busy reading this paper.

Students look forward to one day being able to use AI to do their homework.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

#Teachers are also looking forward to the day when they can use AI to produce test papers.

AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor

Paper address: https://storage.googleapis.com/minerva-paper/minerva_paper.pdf

Demo address: https://minerva- demo.github.io/

Related papers: Chain of Thought https://arxiv.org/abs/2201.11903Scrathpads https://arxiv.org/abs/2112.00114Majority Voting https://arxiv.org /abs/2203.11171

https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html

https: //twitter.com/bneyshabur/status/1542563148334596098

https://twitter.com/alewkowycz/status/1542559176483823622​

The above is the detailed content of AI is going crazy when it comes to quizzes! The accuracy rate of the high-level mathematics examination is 81%, and the competition question score exceeds that of the computer science doctor. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are the types of return values ​​of c language function? Summary of types of return values ​​of c language function? What are the types of return values ​​of c language function? Summary of types of return values ​​of c language function? Apr 03, 2025 pm 11:18 PM

The return value types of C language function include int, float, double, char, void and pointer types. int is used to return integers, float and double are used to return floats, and char returns characters. void means that the function does not return any value. The pointer type returns the memory address, be careful to avoid memory leakage.结构体或联合体可返回多个相关数据。

How to use C language function pointer to find the maximum value of a one-dimensional array How to use C language function pointer to find the maximum value of a one-dimensional array Apr 03, 2025 pm 11:45 PM

Flexible application of function pointers: use comparison functions to find the maximum value of an array. First, define the comparison function type CompareFunc, and then write the comparison function compareMax(a, b). The findMax function accepts array, array size, and comparison function parameters, and uses the comparison function to loop to compare array elements to find the maximum value. This method has strong code reusability, reflects the idea of ​​higher-order programming, and is conducive to solving more complex problems.

CS-Week 3 CS-Week 3 Apr 04, 2025 am 06:06 AM

Algorithms are the set of instructions to solve problems, and their execution speed and memory usage vary. In programming, many algorithms are based on data search and sorting. This article will introduce several data retrieval and sorting algorithms. Linear search assumes that there is an array [20,500,10,5,100,1,50] and needs to find the number 50. The linear search algorithm checks each element in the array one by one until the target value is found or the complete array is traversed. The algorithm flowchart is as follows: The pseudo-code for linear search is as follows: Check each element: If the target value is found: Return true Return false C language implementation: #include#includeintmain(void){i

What are the pointer parameters in the parentheses of the C language function? What are the pointer parameters in the parentheses of the C language function? Apr 03, 2025 pm 11:48 PM

The pointer parameters of C language function directly operate the memory area passed by the caller, including pointers to integers, strings, or structures. When using pointer parameters, you need to be careful to modify the memory pointed to by the pointer to avoid errors or memory problems. For double pointers to strings, modifying the pointer itself will lead to pointing to new strings, and memory management needs to be paid attention to. When handling pointer parameters to structures or arrays, you need to carefully check the pointer type and boundaries to avoid out-of-bounds access.

What are c language function pointers and pointer functions? What's the difference? What are c language function pointers and pointer functions? What's the difference? Apr 03, 2025 pm 11:54 PM

A function pointer is a pointer to a function, and a pointer function is a function that returns a pointer. Function pointers point to functions, used to select and execute different functions; pointer functions return pointers to variables, arrays or other functions; when using function pointers, pay attention to parameter matching and checking pointer null values; when using pointer functions, pay attention to memory management and free dynamically allocated memory; understand the differences and characteristics of the two to avoid confusion and errors.

What are the formats of function definition in C language? What are the formats of function definition in C language? Apr 03, 2025 pm 11:51 PM

The key elements of C function definition include: return type (defining the value returned by the function), function name (following the naming specification and determining the scope), parameter list (defining the parameter type, quantity and order accepted by the function) and function body (implementing the logic of the function). It is crucial to clarify the meaning and subtle relationship of these elements, and can help developers avoid "pits" and write more efficient and elegant code.

What do nested calls and recursive calls of c language functions mean respectively? What do nested calls and recursive calls of c language functions mean respectively? Apr 03, 2025 pm 11:09 PM

C language function calls can be divided into nested calls and recursive calls. Nested calls refer to calling other functions within a function, nesting them layer by layer. Recursive calls refer to the function itself calling itself, which can be used to deal with self-similar structure problems. The key difference is that the functions in nested calls are called in sequence, with independent interaction scopes, while the functions in recursive calls are constantly called, so you need to pay attention to the recursive basis and stack overflow issues. Which calling method to choose depends on the specific requirements and performance requirements of the problem.

What does the c language function return pointer output? What does the c language function return pointer output? Apr 03, 2025 pm 11:36 PM

The C language function returns a pointer to output a memory address. The pointing content depends on the operation inside the function, which may point to local variables (be careful, memory has been released after the function ends), dynamically allocated memory (must be allocated with malloc and free), or global variables.

See all articles