According to news from this website on July 3, Moore Thread announced today that its AI flagship product KUAE intelligent computing cluster solution has been expanded from the current kilo-card level to the 10,000-card scale. Moore Thread Kua'e Wanka intelligent computing cluster uses a full-featured GPU as the base to create a domestic general-purpose accelerated computing platform capable of carrying Wanka scale and 10,000 P-level floating point computing capabilities. It is specially designed for training complex large models with trillions of parameters. And design.
Kuae Wanka Intelligent Computing Solution has the following core features:Wanka WanP: Kuae Intelligent Computing Cluster achieves a single cluster scale of over 10,000 cards, with floating point computing power reaching 10Exa-Flops, reaching PB level. Total video memory capacity, total PB-level ultra-high-speed interconnection bandwidth between cards per second, and total PB-level ultra-high-speed node interconnection bandwidth per second.
Long-term and stable training: Moore Thread boasts that the average trouble-free running time of the Wanka cluster exceeds 15 days, and can achieve stable training of large models for up to 30 days. The average weekly training efficiency is above 99%, far exceeding the industry average.
High MFU: Kua'e Wanka cluster has undergone a series of optimizations at the system software, framework, algorithm and other levels to achieve high-efficiency training of large models. MFU (a common indicator for evaluating the training efficiency of large models) can reach up to 60%.
Eco-friendly: Can accelerate large models of different architectures and modes such as LLM, MoE, multi-modal, Mamba, etc. Based on the MUSA programming language, fully compatible with CUDA capabilities and the automated migration tool Musify, it accelerates the "Day0" migration of new models.
This site has learned that Moore Thread will carry out three Wanka cluster projects, namely:
The above is the detailed content of Moore Thread Kua'e Intelligent Computing Center has expanded to the scale of 10,000 cards, with 10,000 P-level floating point computing capabilities. For more information, please follow other related articles on the PHP Chinese website!