


Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history
Zero One, an AI company owned by Kai-Fu Lee, has another big model player on the stage:
9 billion parameter Yi-9B.
It is known as the "Science Number One" in the Yi series. It "makes up for" code mathematics, and at the same time, its comprehensive ability has not fallen behind.
In a series of open source models of similar scale (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5, etc.) , Best performance.
Old rule, release is open source, especially Friendly to developers:
Yi-9B (BF 16) and its quantized version Yi- 9B (Int8) can be deployed on consumer-grade graphics cards.
An RTX 4090 or an RTX 3090 is enough.
Deeply amplified and multi-stage incremental training
The Yi family of Zero One Thousand Things has previously released the Yi-6B and Yi-34B series .
Both of these are pre-trained on 3.1T token Chinese and English data. Yi-9B is based on this and adds 0.8T token to continue training.
The deadline for data is June 2023.
It was mentioned at the beginning that the biggest improvement of Yi-9B lies in mathematics and coding, so how can these two abilities be improved?
Introduction to Zero One Thousand Things:
Simply increasing the amount of data cannot meet expectations.
relies on first increasing the model size, increasing it to 9B based on Yi-6B, then performing multi-stage data incremental training.
First of all, how to increase the model size?
One premise is that the team found through analysis:
Yi-6B has been fully trained, and the training effect may not improve no matter how much more tokens are added. So consider increasing its size. (The unit in the picture below is not TB but B)
How to increase? The answer is deep amplification.
Introduction to Zero One Thing:
Expanding the width of the original model will bring more performance losses. After depth amplification of the model by selecting the appropriate layer, add a new layer The closer the input/output cosine is to 1.0, that is, the performance of the amplified model can maintain the performance of the original model, and the model performance loss is slight.
According to this idea, Zero Yiwu chose to copy the relatively backward 16 layers (layers 12-28) of Yi-6B to form the 48-layer Yi-9B.
Experiments show that this method has better performance than using the Solar-10.7B model to copy the middle 16 layers (layers 8-24) .
Secondly, what is the multi-stage training method?
The answer is to first add 0.4T data containing text and code, but the data ratio is the same as Yi-6B.
Then add another 0.4T data, which also includes text and code, but focuses on increasing the proportion of code and mathematical data.
(Understood, it’s the same idea as our trick “think step by step” in large model questions)
After these two steps are completed, it’s not over yet , the team also referred to the ideas of two papers (An Empirical Model of Large-Batch Training and Don't Decay the Learning Rate, Increase the Batch Size) to optimize the parameter adjustment method.
That is, starting from a fixed learning rate, every time the model loss stops declining, the batch size is increased so that the decline is uninterrupted and the model learns more fully.
In the end, Yi-9B actually contained a total of 8.8 billion parameters, reaching a 4k context length.
The Yi series has the strongest coding and mathematical capabilities
In actual testing, Zero Yiwu uses the greedy decoding generation method (that is, each time the word with the highest probability value is selected) to test.
The participating models are DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B and Gemma-7B:
(1)DeepSeek-Coder, from a domestic deep search company, its 33B instruction tuning version surpasses GPT-3.5-turbo in human evaluation, and the performance of the 7B version can reach the performance of CodeLlama-34B.
DeepSeek-Math relied on 7B parameters to overturn GPT-4, shocking the entire open source community.
(2)SOLAR-10.7BUpstage AI from South Korea was born in December 2023, and its performance surpasses Mixtral-8x7B-Instruct.
(3)Mistral-7B is the first open source MoE large model, reaching or even surpassing the level of Llama 2 70B and GPT-3.5.
(4)Gemma-7BFrom Google, Zero Yiwu pointed out:
The number of effective parameters is actually at the same level as Yi-9B .
(The naming standards of the two are different. The former only uses Non-Embedding parameters, while the latter uses all parameters and rounds them up)
The results are as follows.
First of all, in terms of coding tasks, the performance of Yi-9B is second only to DeepSeek-Coder-7B, and the other four are all KO.
In terms of mathematical ability, the performance of Yi-9B is second only to DeepSeek-Math-7B, surpassing the other four.
The overall ability is not bad either.
Its performance is the best among open source models of similar size, surpassing all other five players.
Finally, common sense and reasoning ability were also tested:
The result is that Yi-9B is different from Mistral-7B, SOLAR-10.7B and Gemma-7B Up and down.
and language skills, not only English is good, but Chinese is also widely praised:
Finally, after reading these, some netizens said: I can’t wait to try it tried.
Others are worried about DeepSeek:
Hurry up and strengthen your "game". The overall dominance is gone==
The portal is here: https://huggingface.co/01-ai/Yi-9B
The above is the detailed content of Consumer grade graphics cards available! Li Kaifu released and open sourced the 9 billion parameter Yi model, which has the strongest code mathematical ability in history. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

Recently, the military circle has been overwhelmed by the news: US military fighter jets can now complete fully automatic air combat using AI. Yes, just recently, the US military’s AI fighter jet was made public for the first time and the mystery was unveiled. The full name of this fighter is the Variable Stability Simulator Test Aircraft (VISTA). It was personally flown by the Secretary of the US Air Force to simulate a one-on-one air battle. On May 2, U.S. Air Force Secretary Frank Kendall took off in an X-62AVISTA at Edwards Air Force Base. Note that during the one-hour flight, all flight actions were completed autonomously by AI! Kendall said - "For the past few decades, we have been thinking about the unlimited potential of autonomous air-to-air combat, but it has always seemed out of reach." However now,
