News on September 23, Huawei announced a major breakthrough during today’s Full Connectivity Conference 2023, officially launching Atlas 900 SuperCluster is a new architecture of Ascend AI computing cluster, specially designed to support ultra-large-scale model training, with a parameter capacity of up to trillions.
Wang Tao, President of Huawei Enterprise BG and Director of the ICT Infrastructure Business Management Committee, introduced this innovative product at the press conference. He said that Atlas 900 SuperCluster adopts the latest generation of Huawei Galaxy AI intelligent computing switch CloudEngine XH16800, this switch has high-density 800GE port capabilities, so that only two layers of switching network are needed to build a very large-scale cluster with 2,250 nodes, which is equivalent to the scale of 18,000 computing cards.
This cluster adopts a new super-node architecture, which greatly improves the performance of large model training, which is particularly eye-catching. Wang Tao emphasized that large computing power has become the core engine driving the development of artificial intelligence. In order to meet the growing computing needs, Huawei has carried out system architecture innovation, integrating computing power, transportation capacity, and storage capacity, successfully breaking through the large computing power bottleneck
According to the editor’s understanding, in order to further promote the innovation of large-scale models, Huawei has launched CANN7.0. This is a more open and easy-to-use platform. It is not only compatible with the industry's mainstream AI frameworks, acceleration libraries and large models, but also deeply opens up the underlying capabilities, allowing the AI framework and acceleration libraries to call and manage computing resources more flexibly, providing developers with more customized high-performance computing resources. The possibility of sub-units
Huawei’s comprehensive advantages in computing, network, storage, energy and other fields make this new cluster in the device The reliability of the system has been comprehensively improved at the level, node level, cluster level and business level, and the stability of large model training has been improved from the day-level to the monthly level. This breakthrough product will provide stronger support for the development of the field of artificial intelligence and promote the arrival of the era of large models.
The above is the detailed content of Huawei launches Atlas 900 SuperCluster, ushering in a new era of large-scale model training. For more information, please follow other related articles on the PHP Chinese website!