Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history-AI-php.cn

Table of Contents

Deeply amplified and multi-stage incremental training

The Yi series has the strongest coding and mathematical capabilities

Home

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

PHPz

Mar 07, 2024 pm 05:50 PM

data Model

Zero One, an AI company owned by Kai-Fu Lee, has another big model player on the stage:

9 billion parameter Yi-9B.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

It is known as the "Science Number One" in the Yi series. It "makes up for" code mathematics, and at the same time, its comprehensive ability has not fallen behind.

In a series of open source models of similar scale (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5, etc.) , Best performance.

Old rule, release is open source, especially Friendly to developers:

Yi-9B (BF 16) and its quantized version Yi- 9B (Int8) can be deployed on consumer-grade graphics cards.

An RTX 4090 or an RTX 3090 is enough.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Deeply amplified and multi-stage incremental training

The Yi family of Zero One Thousand Things has previously released the Yi-6B and Yi-34B series .

Both of these are pre-trained on 3.1T token Chinese and English data. Yi-9B is based on this and adds 0.8T token to continue training.

The deadline for data is June 2023.

It was mentioned at the beginning that the biggest improvement of Yi-9B lies in mathematics and coding, so how can these two abilities be improved?

Introduction to Zero One Thousand Things:

Simply increasing the amount of data cannot meet expectations.

relies on first increasing the model size, increasing it to 9B based on Yi-6B, then performing multi-stage data incremental training.

First of all, how to increase the model size?

One premise is that the team found through analysis:

Yi-6B has been fully trained, and the training effect may not improve no matter how much more tokens are added. So consider increasing its size. (The unit in the picture below is not TB but B)

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

How to increase? The answer is deep amplification.

Introduction to Zero One Thing:

Expanding the width of the original model will bring more performance losses. After depth amplification of the model by selecting the appropriate layer, add a new layer The closer the input/output cosine is to 1.0, that is, the performance of the amplified model can maintain the performance of the original model, and the model performance loss is slight.

According to this idea, Zero Yiwu chose to copy the relatively backward 16 layers (layers 12-28) of Yi-6B to form the 48-layer Yi-9B.

Experiments show that this method has better performance than using the Solar-10.7B model to copy the middle 16 layers (layers 8-24) .

Secondly, what is the multi-stage training method?

The answer is to first add 0.4T data containing text and code, but the data ratio is the same as Yi-6B.

Then add another 0.4T data, which also includes text and code, but focuses on increasing the proportion of code and mathematical data.

(Understood, it’s the same idea as our trick “think step by step” in large model questions)

After these two steps are completed, it’s not over yet , the team also referred to the ideas of two papers (An Empirical Model of Large-Batch Training and Don't Decay the Learning Rate, Increase the Batch Size) to optimize the parameter adjustment method.

That is, starting from a fixed learning rate, every time the model loss stops declining, the batch size is increased so that the decline is uninterrupted and the model learns more fully.

In the end, Yi-9B actually contained a total of 8.8 billion parameters, reaching a 4k context length.

The Yi series has the strongest coding and mathematical capabilities

In actual testing, Zero Yiwu uses the greedy decoding generation method (that is, each time the word with the highest probability value is selected) to test.

The participating models are DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B and Gemma-7B:

(1)DeepSeek-Coder, from a domestic deep search company, its 33B instruction tuning version surpasses GPT-3.5-turbo in human evaluation, and the performance of the 7B version can reach the performance of CodeLlama-34B.

DeepSeek-Math relied on 7B parameters to overturn GPT-4, shocking the entire open source community.

(2)SOLAR-10.7BUpstage AI from South Korea was born in December 2023, and its performance surpasses Mixtral-8x7B-Instruct.

(3)Mistral-7B is the first open source MoE large model, reaching or even surpassing the level of Llama 2 70B and GPT-3.5.

(4)Gemma-7BFrom Google, Zero Yiwu pointed out:

The number of effective parameters is actually at the same level as Yi-9B .

(The naming standards of the two are different. The former only uses Non-Embedding parameters, while the latter uses all parameters and rounds them up)

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

The results are as follows.

First of all, in terms of coding tasks, the performance of Yi-9B is second only to DeepSeek-Coder-7B, and the other four are all KO.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

In terms of mathematical ability, the performance of Yi-9B is second only to DeepSeek-Math-7B, surpassing the other four.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

The overall ability is not bad either.

Its performance is the best among open source models of similar size, surpassing all other five players.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Finally, common sense and reasoning ability were also tested:

The result is that Yi-9B is different from Mistral-7B, SOLAR-10.7B and Gemma-7B Up and down.

and language skills, not only English is good, but Chinese is also widely praised:

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Finally, after reading these, some netizens said: I can’t wait to try it tried.

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

Others are worried about DeepSeek:

Hurry up and strengthen your "game". The overall dominance is gone==

Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history

The portal is here: https://huggingface.co/01-ai/Yi-9B

The above is the detailed content of Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7611

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

Open source! Beyond ZoeDepth! DepthFM: Fast and accurate monocular depth estimation! Apr 03, 2024 pm 12:04 PM

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Slow Cellular Data Internet Speeds on iPhone: Fixes May 03, 2024 pm 09:01 PM

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks Apr 29, 2024 pm 06:55 PM

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

See all articles