Home Technology peripherals AI Recognition from the first prize of Science and Technology Progress Award: Tencent solved the problem of training large models with trillions of parameters

Recognition from the first prize of Science and Technology Progress Award: Tencent solved the problem of training large models with trillions of parameters

Mar 27, 2024 pm 09:41 PM
industry Tencent Cloud

The list of winners of the China Electronics Society 2023 Science and Technology Awards has been announced. This time, we found a familiar figure - Tencent Angel machine learning platform.

In the current era of rapid development of large models, the Science and Technology Award is awarded to machine learning platform research and application projects, fully affirming the value and importance of model training platforms. .

Recognition from the first prize of Science and Technology Progress Award: Tencent solved the problem of training large models with trillions of parameters

The Science and Technology Award recognizes the research and application of machine learning platform projects, especially in the context of the rapid development of large-scale models, and pays attention to the value and importance of model training platforms fully recognized.

With the rise of deep learning, major companies have begun to realize the importance of machine learning platforms in the development of artificial intelligence technology. Companies such as Google, Microsoft, and Nvidia have launched their own machine learning platforms to speed up the training process of artificial intelligence models. These platforms provide developers with convenient support, allowing them to build and optimize complex artificial intelligence systems faster. This trend has prompted people to pay more attention to the development of machine learning technology and laid a solid foundation for future artificial intelligence applications.

Starting in 2023, the rise of large-scale models will further promote the increase in the number of model parameters. Major companies have launched models with parameter scales reaching hundreds of billions or even trillions, and these models generally adopt deep neural network structures. However, this development has also brought about two core pain points: the difficulty of distributed training of models and the model design challenges caused by application complexity.

Why Angel machine learning platform?

Detailed explanation of the four core technology breakthroughs

The appraisal committee composed of a number of academicians and other authoritative experts believes that Tencent Angel machine learning platform has high technical complexity and It is difficult to develop, highly innovative, and has broad application prospects. The overall technology has reached the international advanced level, among which the efficient cache scheduling and management technology for all-to-all communication, adaptive pre-sampling and graph structure search technology have reached the international leading level.

Recognition from the first prize of Science and Technology Progress Award: Tencent solved the problem of training large models with trillions of parameters

##                                                                                                                                                                                                                                                            recording Architecture, the characteristic of this architecture is that the two tasks of storing model parameters and performing model calculations are run on different servers. By adding more servers, larger models with higher computational requirements can be supported. This architecture makes the model training process more efficient and can handle large-scale data sets and complex model calculations. The design of the distributed parameter server enables the system to have good scalability and flexibility, and can meet machine learning tasks of different scales and needs. The advantage of this architecture is that it can effectively utilize cluster resources, improve computing efficiency, and provide users with faster and more efficient services. Achieve technological breakthroughs in core areas such as caching, model storage and scheduling, multi-modal model and fusion learning sorting, and large-scale graph models and structure search technology.

In order to improve training efficiency, terabyte-level machine learning models usually adopt distributed training methods, which require a large number of parameters and gradient synchronization. Taking the 1.8T model kilocalorie training as an example, The IO communication volume reaches 25TB, accounting for 53% of the time consumption. In addition, coupled with the heterogeneous network environment between different computing power clusters, the communication network delay is different, which puts higher requirements on the communication overhead during the model training process. . Tencent Angel machine learning platform is based on the efficient communication and cache scheduling management technology of Tencent Cloud Xingmai network, which can effectively solve the problem of high communication overhead for TB-level model training, reduce network communication time by 80%, and achieve distributed training performance that reaches the mainstream solution in the industry. 2.5 times.

Recognition from the first prize of Science and Technology Progress Award: Tencent solved the problem of training large models with trillions of parameters Under the current computing power conditions, although the model reaches TB level, the video memory of mainstream GPU is still only 80G, and there is a bottleneck in parameter storage. In response to the key issue of difficulty in storing terabyte-level model training parameters, Tencent Angel machine learning platform proposes a storage management mechanism from a unified perspective of video memory and main memory, which achieves a model storage capacity that is doubled compared to the industry and a training performance that is twice that of mainstream solutions in the industry.

To develop a large model into a general model, it is inseparable from the processing support of multi-modal data. It is difficult to align, integrate and understand data of different modalities, such as text, images, videos and so on. In the training of multi-modal models, Tencent Angel machine learning platform proposes a full-link ranking advertising recommendation technology based on multi-modal fusion learning for advertising scenarios, helping to increase the advertising recall rate by more than 40%.

Recognition from the first prize of Science and Technology Progress Award: Tencent solved the problem of training large models with trillions of parameters

In addition, for graph model training for recommendation systems, Tencent Angel machine learning platform has designed graph node feature adaptive graph network structure search technology, which can automatically output the optimal structure , which solves the problem of "difficulty in graph data mining" in TB graph model applications, improves model training performance by 28 times, and has the best scalability compared with the industry.

The Road to Forging Tencent Angel Machine Learning Platform

Tencent Hunyuan Large Model Expands to Trillion Scale

As Tencent The basic platform for artificial intelligence technology, Tencent Angel platform was born in 2015 and supports PS-Worker distributed training and the training of billion-parameter LDA models.

In 2017, the Angel framework was open sourced on Github and open to developers. At the same time, technically, Angel solved the communication problem under heterogeneous networks and further improved its performance. In 2019, we made a breakthrough in the multi-modal understanding technology of scalable graph models, solving the problem of scalable graph models with trillions of nodes. In 2021, GPU memory unified perspective storage technology is proposed to solve the problem of large model parameter storage and performance.

In the creation of Tencent’s general artificial intelligence large model Tencent Hunyuan, Tencent’s Angel machine learning platform also played an important role.

In September 2023, Tencent’s Hunyuan large model was officially unveiled. The pre-training corpus exceeds 2 trillion tokens, and it has strong Chinese understanding and creation capabilities, logical reasoning capabilities, and reliable task execution capabilities.

Faced with the need to build Tencent Hunyuan's large model, Tencent's Angel machine learning platform has created self-developed machine learning frameworks Angel PTM and Angel HCF for large model training and inference, supporting single tasks at the 10,000-card level. Scale training and large-scale inference service deployment. The efficiency of large model training is increased to 2.6 times that of mainstream open source frameworks. Training of hundreds of billions of large models can save 50% of computing power costs. After the upgrade, it supports ultra-large-scale training of 10,000 cards per task. In terms of reasoning, the reasoning speed of Tencent Angel machine learning platform has been increased by 1.3 times. In the application of Tencent Hunyuan large model Wenshengtu, the reasoning time is shortened from the original 10 seconds to 3 to 4 seconds.

In addition, Angel also provides a one-stop platform from model development to application implementation, allowing users to quickly call Tencent's Hunyuan large model capabilities through API interfaces or fine-tuning, accelerating the construction of large model applications. Tencent More than 400 Tencent products and scenarios, including conferences, Tencent News, and Tencent Video, have been connected to Tencent Hunyuan internal testing.

Tencent Hunyuan has expanded the model to trillions of parameters by adopting a hybrid expert model (MoE) structure, promoting performance improvement and reduction of inference costs. As a general model, Tencent Hunyuan leads the industry in Chinese performance, especially in text generation, mathematical logic and multi-turn dialogue. Currently, Tencent Hunyuan is also actively developing multi-modal models to further enhance the capabilities of Vincent pictures and Vincent videos.

Tencent’s large number of application scenarios provide an experimental ground for the implementation of Tencent’s Angel machine learning platform. In addition to Tencent's Hunyuan large model, Tencent's Angel machine learning platform also supports products such as Tencent advertising and Tencent conferences, and serves multiple industries and corporate customers through Tencent Cloud, assisting the digital and intelligent development of all walks of life.

Take Tencent Advertising as an example, using innovative technologies such as Tencent Angel machine learning flat distributed training optimization and multi-modal understanding graph data mining, the training speed of multi-modal large models in advertising business scenarios has been increased by 5 times. The model scale is increased by 10 times, and the advertising recall rate is greatly improved.

The above is the detailed content of Recognition from the first prize of Science and Technology Progress Award: Tencent solved the problem of training large models with trillions of parameters. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI ​​robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI ​​side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

See all articles