Recently, the Ministry of Industry and Information Technology of China and the People's Government of the Ningxia Hui Autonomous Region jointly organized the 2023 China Computing Power Conference, which was held in Yinchuan. At the meeting, the Computing Power China·Annual Breakthrough Results jointly selected by experts and scholars in the field of computing power were announced. Among them, Alibaba Cloud's "PAI Lingjun Intelligent Computing Service" won this important honor as a representative of domestic AI intelligent computing infrastructure.
This selection is set up to address hot spots, difficulties and key issues in the field of computing power. It adheres to the principles of "pioneering, leading, authoritative and fair" and aims to discover those who have reached the world's leading level in computing power or related industries. Basic theories, innovative methods, method models and innovative results of platform applications. The review committee is composed of academicians of the Chinese Academy of Sciences, academicians of the Chinese Academy of Engineering, experts from well-known universities and scientific research institutions, and technical leaders of leading companies. It gathers extremely influential experts and scholars in China's computing power-related fields.
The PAI Lingjun intelligent computing service launched by Alibaba Cloud is a computing power infrastructure service created to meet the rapidly growing AI computing needs. This service can provide enterprises and developers with a one-stop AI development full-process engineering platform and intelligent computing power. It has the characteristics of ultra-large parallel computing scale, high performance, high efficiency and high utilization. The cluster scalability of this service can reach the scale of 100,000 cards, and can support the simultaneous training of multiple large models with trillions of parameters. A single training task can reach the scale of 10,000 cards, and the linear expansion efficiency of the kilocard scale reaches 92%
In previous practical training, PAI Lingjun intelligent computing service has stably supported the low-energy training of multi-modal large models with 10 trillion parameters. In the training and application of large models such as Tongyi Qianwen, large model training The efficiency is significantly improved by nearly 10 times, and the reasoning efficiency is increased by more than 2 times.
The selection believes that Alibaba Cloud PAI Lingjun intelligent computing service "with high-performance network, high-performance file storage, high stability, ultimate software and hardware joint optimization capabilities and serverless service capabilities, for large model research, AI for Science" , AIGC and other scenarios to provide solid support."
Pai Lingjun’s intelligent computing services are leading and innovative in multiple core technology fields. Large-scale model training requires processing billions of parameters, which can not only be accomplished by simply "stacking graphics cards", but requires systematic engineering that integrates complex technologies such as underlying computing power, network, storage, data computing, and AI frameworks. In addition to realizing the implementation of large-scale computing projects through exquisite design, it is also necessary to achieve "fast and economical" and give full play to every drop of computing power
At the IaaS infrastructure layer, Alibaba Cloud has built the intelligent computing cluster Lingjun, which coordinates and optimizes cluster computing resources through predictable network technology that integrates terminals and networks and integrates software and hardware to achieve microsecond-level stable interconnection and control between chips. Efficient parallel computing. Relevant technological innovations have significantly eliminated the scalability bottleneck of AI computing power. The maximum scale of the Lingjun cluster can be expanded to "100,000 card level", which can provide flexible and multi-scale intelligent computing power for the development and application of large models, and provide support for upper-layer platform applications. Convenient containerization service.
In the PaaS platform service layer, Alibaba Cloud machine learning platform PAI can automatically split and allocate huge training tasks, and provide the fastest and most computationally efficient high-speed computing through integrated collaborative scheduling capabilities of hardware, network, and framework. Performance distributed training solution. At the same time, PAI is equipped with an automatic fault-tolerant training framework AIMaster, which provides the ultimate stability guarantee during the training cycle of large language models that often takes weeks or months, reduces the cost of manual intervention, and can shorten the training time of large models by 10 times. In addition, PAI provides a simple and easy-to-use RLHF reinforcement learning framework, which can greatly improve the performance of large language models.
PAI Lingjun Intelligent Computing Service also launched the first serverless intelligent computing service model in China, providing users with a user experience of one-click activation, on-demand allocation, and simplified operation and maintenance. This service supports the flexible reuse of AI computing resources, which can significantly improve cluster utilization and reduce customers’ usage costs
It is understood that Alibaba Cloud PAI Lingjun intelligent computing service has been used by many enterprises and scientific research institutions. The intelligent computing center "Fuyao" jointly built by Alibaba Cloud and Xpeng Motors has become the largest autonomous driving intelligent computing center in China, increasing the training speed of Xpeng Automobile's autonomous driving model by more than 170 times. CFFF, a cloud-based intelligent computing platform jointly built by Alibaba Cloud and Fudan University, recently released a large-scale short- and medium-term weather forecast model with 4.5 billion parameters, shortening the forecast speed from the original hour to less than 3 seconds.
To rewrite in order to promote the content, the original text needs to be rewritten into Chinese, and the original sentence does not need to appear
The above is the detailed content of Alibaba Cloud AI Computing won the 'Computing Power China' Annual Breakthrough Achievement Award. For more information, please follow other related articles on the PHP Chinese website!