Home Hardware Tutorial Hardware Review 10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.

10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.

Aug 27, 2024 pm 03:31 PM
Google Model Research math Open source author main

With less than 10 lines of code, the mathematical capabilities of large models (GSM8k) can be improved by 20%!

Several independent scholars have proposed improvements to large model sampling, which has attracted the attention of the open source community.

Currently, this method has achieved results on Mistral-7B, and testing on Llama3-70B is also ongoing.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

This method is called min-p sampling, which aims to balance the coherence and diversity of the generated text.

Simply put, it allows the model to exert different characteristics in different situations, such as maintaining stable performance on factual issues and being creative in scenarios such as writing.

Currently, this method has achieved results on Mistral-7B, and testing on Llama-70B is about to begin.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

In the paper, the author mentioned that this method has been widely used by the open source community.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

At the same time, the author also revealed that closed source model manufacturers such as Anthropic and Google have also tested or are testing min-p.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

The news has also been confirmed by Google. Logan Kilpatrick, the developer community leader who switched from OpenAI to Google, has replied "On it".

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

Abram Jackson, a researcher at Microsoft Copilot, said after reading it that this is the first improvement he has seen regarding token sampling in the inference process, and there is still a lot of room for improvement in the future.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

It is worth mentioning that the main author of this widely watched study, Minh Nhat Nguyen, has never systematically learned CS at all, but is self-taught.

With the help of an AI security research organization called Apart Research, Minh and other members of the team completed the project.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

Dynamic adjustment of the sampling threshold

min-p is a dynamic truncation sampling method, the core of which is to scale the minimum probability threshold according to the maximum probability of the token distribution at each step.

The purpose of this is mainly to balance the coherence and diversity of the generated text, especially under higher temperature conditions.

Specifically, min-p introduces a basic probability threshold p_base, which represents the minimum probability requirement for entering the sampling pool.

When generating tokens at each step, min-p will multiply p_base with the largest token probability p_max in the current probability distribution to obtain a scaled absolute threshold p_scaled.

Only tokens with probability greater than or equal to p_scaled can enter the sampling pool.

When the model's prediction probability for a certain token is very high (that is, p_max is very large), the value of p_scaled will also be very high, causing the sampling pool to be greatly reduced, and the vast majority of low-probability tokens are filtered, leaving only a few with high confidence. The choice of ensures the consistency of the output;

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

When the model’s prediction probabilities for all tokens are relatively close (p_max is lower), the value of p_scaled will also become lower accordingly, relaxing the requirements for the sampling pool , incorporating more medium-probability tokens gives the model more space to generate more diverse content.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

After determining the sampling pool, min-p will scale the token probability distribution according to temperature.

It divides the logarithmic probability of token by a temperature parameter τ, and after normalization, the scaled probability distribution of temperature is obtained.

A τ value greater than 1 will make the probability distribution flatter, increasing the chance of low-probability tokens being selected; when

τ is less than 1, it will make the distribution sharper, strengthening the advantages of high-probability tokens.

Finally, min-p randomly selects the next token from the scaled sampling pool according to the adjusted probability distribution.

Stability and creativity, "I want it all"

What is the effect of the min-p method? The author used Mistral-7B as the basic model for testing. Let's look at the results by scenario.

In the inference task, the author uses the GPQA dataset. When temperature is 1, you can see that min-p has a slight advantage over the past top-p.

As temperature increases, the GPQA score shows a downward trend overall, but it can be observed that min-p decreases significantly slower than top-p.

The downward trend of min-p does not become obvious until temperature reaches 3, when the score of top-p is close to 0.

In other words, compared to top-p, min-p better maintains the required stability in inference tasks.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

Mathematical tasks also need to maintain stable performance. Here the author used the GSM8K data set for testing.

The result is that the score corresponding to min-p decreases with temperature faster than in GPQA, but still slower than the top-p method.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

The third type of task is creative writing. At this time, the requirements for stability are not so high, but the model needs to be more creative.

This test was done using the AlpacaEval dataset, and the experimental data was obtained from an independent evaluator in the open source community.

Experimental results show that under the settings of temperature=1.5 and min-p=0.1, the performance of min-p is particularly outstanding and can generate creative writing content that is difficult to generate with the top-p method.

Under this parameter, the text obtained by the min-p method achieved a human judgment preference rate of 58.12%, which is much higher than the performance of other methods under similar settings.

10 行代码让大模型数学提升 20%,“野路子”研究谷歌也测上了,主要作者全靠自学成才

Paper address:

https://arxiv.org/abs/2407.01082

GitHub:

https://github.com/menhguin/minp_paper/

Reference link:

https:// x.com/menhguin/status/1826132708508213629

The above is the detailed content of 10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1268
29
C# Tutorial
1243
24
Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Mar 18, 2025 pm 12:15 PM

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Tutorial on how to register, use and cancel Ouyi okex account Tutorial on how to register, use and cancel Ouyi okex account Mar 31, 2025 pm 04:21 PM

This article introduces in detail the registration, use and cancellation procedures of Ouyi OKEx account. To register, you need to download the APP, enter your mobile phone number or email address to register, and complete real-name authentication. The usage covers the operation steps such as login, recharge and withdrawal, transaction and security settings. To cancel an account, you need to contact Ouyi OKEx customer service, provide necessary information and wait for processing, and finally obtain the account cancellation confirmation. Through this article, users can easily master the complete life cycle management of Ouyi OKEx account and conduct digital asset transactions safely and conveniently.

Detailed tutorial on how to register for binance (2025 beginner's guide) Detailed tutorial on how to register for binance (2025 beginner's guide) Mar 18, 2025 pm 01:57 PM

This article provides a complete guide to Binance registration and security settings, covering pre-registration preparations (including equipment, email, mobile phone number and identity document preparation), and introduces two registration methods on the official website and APP, as well as different levels of identity verification (KYC) processes. In addition, the article also focuses on key security steps such as setting up a fund password, enabling two-factor verification (2FA, including Google Authenticator and SMS Verification), and setting up anti-phishing codes, helping users to register and use the Binance Binance platform for cryptocurrency transactions safely and conveniently. Please be sure to understand relevant laws and regulations and market risks before trading and invest with caution.

How to optimize jieba word segmentation to improve the keyword extraction effect of scenic spot comments? How to optimize jieba word segmentation to improve the keyword extraction effect of scenic spot comments? Apr 01, 2025 pm 06:24 PM

How to optimize jieba word segmentation to improve keyword extraction of scenic spot comments? When using jieba word segmentation to process scenic spot comment data, if the word segmentation results are ignored...

Tutorial on using gate.io mobile app Tutorial on using gate.io mobile app Mar 26, 2025 pm 05:15 PM

Tutorial on using gate.io mobile app: 1. For Android users, visit the official Gate.io website and download the Android installation package, you may need to allow the installation of applications from unknown sources in your mobile phone settings; 2. For iOS users, search "Gate.io" in the App Store to download.

The latest updates to the oldest virtual currency rankings The latest updates to the oldest virtual currency rankings Apr 22, 2025 am 07:18 AM

The ranking of virtual currencies’ “oldest” is as follows: 1. Bitcoin (BTC), issued on January 3, 2009, is the first decentralized digital currency. 2. Litecoin (LTC), released on October 7, 2011, is known as the "lightweight version of Bitcoin". 3. Ripple (XRP), issued in 2011, is designed for cross-border payments. 4. Dogecoin (DOGE), issued on December 6, 2013, is a "meme coin" based on the Litecoin code. 5. Ethereum (ETH), released on July 30, 2015, is the first platform to support smart contracts. 6. Tether (USDT), issued in 2014, is the first stablecoin to be anchored to the US dollar 1:1. 7. ADA,

Top 10 recommended for safe and reliable virtual currency purchase apps Top 10 recommended for safe and reliable virtual currency purchase apps Mar 18, 2025 pm 12:12 PM

Top 10 recommended global virtual currency trading platforms in 2025, helping you to play the digital currency market! This article will deeply analyze the core advantages and special features of ten top platforms including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. Whether you are pursuing high liquidity and rich trading types, or focusing on safety, compliance and innovative functions, you can find a platform that suits you here. We will provide a comprehensive comparison of transaction types, security, special functions, etc. to help you choose the most suitable virtual currency trading platform and seize the opportunities of digital currency investment in 2025

Okex trading platform official website login portal Okex trading platform official website login portal Mar 18, 2025 pm 12:42 PM

This article introduces in detail the complete steps of logging in to the OKEx web version of Ouyi in detail, including preparation work (to ensure stable network connection and browser update), accessing the official website (to pay attention to the accuracy of the URL and avoid phishing website), finding the login entrance (click the "Login" button in the upper right corner of the homepage of the official website), entering the login information (email/mobile phone number and password, supporting verification code login), completing security verification (sliding verification, Google verification or SMS verification), and finally you can conduct digital asset trading after successfully logging in. A safe and convenient login process to ensure the safety of user assets.

See all articles