Home Technology peripherals AI Detailed explanation of Qingyun Technology's launch of AI computing power products and services to address computing power challenges

Detailed explanation of Qingyun Technology's launch of AI computing power products and services to address computing power challenges

Oct 16, 2023 pm 08:37 PM
ai computing power Computing power challenge Qingyun Technology

At the Qingyun Technology AI Computing Power Conference, Miao Hui, product manager, introduced in detail the Qingyun AI computing power scheduling platform and Qingyun AI computing power cloud services. The following is the full text of the speech:

Artificial intelligence users face computing power challenges

With the explosion of the artificial intelligence industry, AIGC, large models, scientific research computing, enterprise-level big data and artificial intelligence have put forward higher demands on computing power centers. Especially in the face of data centers with a single computing power, it is no longer able to meet the growing demand for computing power in all walks of life. Therefore, more intelligent computing centers, supercomputing centers and general cloud computing services are needed to provide computing power services to the whole society.

However, the AI ​​industry, AI infrastructure and users of AI computing power also face a series of challenges:

Bottleneck of unified management of multiple resources. Faced with users' requirements for multiple computing power, multiple storage, the entire computing network, and nearby services, Qingyun provides a multiple resource management unified service scheduling platform to solve the chaotic situation of multiple resource management.

High-speed network bottleneck. In terms of AI high-speed network construction, Qingyun uses high-speed networks to interconnect computing and storage devices, and uses general-purpose networks to publish application services, that is, Qingyun solves multi-region high-speed networking problems through Qingyun's platform.

Environment construction is a tedious bottleneck. Algorithm engineers and R&D engineers may waste a lot of time on setting up basic environments such as hardware servers and storage servers. Through Qingyun AI intelligent computing services, training platforms and inference model platforms, the environment construction is simplified and one-click deployment can be achieved.

Multi-service integration bottleneck. Qingyun integrates multiple businesses and combines traditional cloud computing, super computing and intelligent computing to provide panoramic computing services for more businesses and more customers.

Lack of operational services. Qingyun also provides comprehensive operations and operation and maintenance management services to the computing power operation center and computing power management department.

Qingyun AI computing power scheduling platform

The full-stack product architecture of Qingyun AI scheduling products is multi-AZ and multi-zone, that is, products in multiple regions can be unified and integrated to provide computing power services to the whole society with a global service. Specifically, it will manage the underlying infrastructure, make the infrastructure logical and business-oriented through the data logic layer, and form an AI computing power cluster through specific products or services, including GPU hosts, bare metal, virtualization, sharing forms, etc. , container inference services, model markets and other related businesses, providing computing power scheduling and application scenario implementation capabilities for customers across the industry.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

Covering all aspects, new model of computing power construction center

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

Overall, the AI ​​computing power scheduling platform capabilities provided by Qingyun Technology are mainly based on the following four aspects:

First, the entire platform is compatible with all computing chips on the market (including newly produced Xinchuang chips), as well as GPU-related graphics cards and network cards.

Second, carry out unified management, distribution, monitoring and scheduling of the above adaptation resources, and provide full life cycle online management functions from user application to release after use.

Third, for the management side and the user side, Qingyun unified management platform allows users and administrators to fully operate AI infrastructure and AI computing power cloud services.

Facing the field of intelligent computing, Qingyun will commercialize and scenario-based more services, such as large language model training and reasoning, and load balancing services based on text generation. Qingyun can also use the AI ​​computing power scheduling platform to provide customers with Provides convenient operations such as one-click deployment, one-click expansion, and one-click load balancing. In terms of load balancing, especially in network, public network and computing infrastructure, it can achieve second-level delivery and second-level capacity expansion.

Finally, based on the above three capabilities, Qingyun can support computing in various industries including high-performance computing, artificial intelligence computing and general computing models, and create a unified user management, distribution and distribution system with independent innovation and complete functions for customers. Operation platform.

Nine Abilities Unlock AI Computing Power Freedom

Through years of industry accumulation, Qingyun AI computing power scheduling platform has formed nine key capabilities:

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

1. Multi-region and multi-business resource integration capabilities

Especially for the diversification of computing power services in western Sichuan or northwest regions, when providing computing power services to the eastern region, scientific research institutions, and universities, Qingyun can manage multi-region resources in a unified manner and build effective high-speed networks through cooperation with telecom operators. network.

2. Distributed scheduling and management capabilities

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

According to the principle of nearby use, Qingyun manages and distributes all infrastructure (including computing resources and storage resources) in different regions, computing centers, and data centers, and configures scheduling priorities, including affinity and non-affinity. . In VMs, hosts and bare metal servers (including containers based on Container and Pod), affinity and non-affinity data configuration can be performed on the management side of Qingyun AI computing power scheduling platform to ensure the priority of data scheduling. , the purpose is to ensure that users get a consistent experience in the final use of data, application for computing resources, business training, and business reasoning.

3. Resource scheduling capability

In terms of resource scheduling capabilities, Qingyun has the following six major advantages:

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

1) Immediately schedule and expand resources of tens of thousands of cards

Mainly oriented to AI computing scenarios, especially large model inference. Some model scenarios require inference several times a year, which requires the instant construction of a training platform with dozens of cards or even tens of thousands of cards. According to this requirement, the built-in, adaptation and resource management can be carried out on the Qingyun AI computing power scheduling platform to ensure that the computing power cluster can immediately support the resources of tens of thousands of cards and can be released immediately after use. In terms of resource environment and configuration, Qingyun AI computing power scheduling platform has done a lot of automation to ensure that Wanka resources can be scheduled uniformly.

2) Communication link shortest priority scheduling

Keeping data from detours is also the main purpose of Qingyun AI computing power scheduling platform. In the scenarios of AI training and AI inference, there will be a large amount of data interaction between nodes and between nodes and storage. In this case, Qingyun performs some configurations on the switch at the same time to ensure that computing and storage resources can be on one switch. Prioritize scheduling within a computer room or cabinet to prevent data from being detoured and reduce the constraints of difficult network transmission during AI training.

3) Support heterogeneous platforms

When building a cluster, users can choose different services to run on different cards. Qingyun Technology also carried out domestic adaptation and domestic replacement of the chips. 4) Improve the granularity of the scheduling system

The first is a scheduling system based on Slurm, and the second is a scheduling system based on K8s. In terms of the granularity of the scheduling system, users can perceive true job-level accuracy. When every training task is run on every process on every card, it can be implemented through large-scale data monitoring, business scheduling, etc. Monitoring of job anomalies ensures that users can handle abnormal situations of training tasks in a timely manner to maximize resource scheduling and reduce waste at this level. If something is wrong, modify it immediately and run it immediately.

5) The management side implements scheduling priority configuration

Because different computing power centers will operate different computing power services, especially in the case of multiple data centers, users can prioritize the scheduling through the Qingyun AI computing power scheduling platform. All are built-in in the early stage, and users can also configure it in the later stage. Make reservations, pauses, resumes, priority settings, queuing and other settings to increase priority. At the management level, Qingyun can prioritize resource allocation for users who apply for special applications or users with high priority.

6) Flexible scheduling and allocation of resources for the intelligent computing industry

Qingyun enables dynamic and flexible resource scheduling and configurability to address challenging priorities in AI systems. This is why Qingyun continues to discover new problems in AI scheduling computing power or AI scenarios, constantly uses the platform to solve new problems, and uses new products to solve some major problems in the industry.

4. High-speed parallel storage capability

Qingyun’s computing and storage products are diverse and diverse, providing the following three types of storage:

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

1) Qingyun U10000 Object Storage

Storage models, codes and commonly used data calls are mainly used for large-scale data backup and data reading operations.

2) Parallel file storage EPFS

In terms of large-scale parallel writing of data, Qingyun provides parallel file storage EPFS, which mainly provides all-flash parallel file storage for MPI-level data writing operations.

3) File storage NAS

You can store some common documents, texts, etc. All Qingyun's storage products can be internally interconnected with its own computing products to perform data transmission, distribution, backup, etc. on the internal high-speed network.

5. Hybrid networking capability

Different high-speed networks can be provided for different computing scenarios, such as computing IB network and storage IB network. How to optimally configure them?

Qingyun interconnects high-configuration computing products and high-configuration storage products, and interconnects medium- and low-configuration products for training scenarios, inference scenarios, and general application service scenarios.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

6. Algorithm development support capabilities

For algorithm developers, Qingyun provides more comprehensive cloud service products. Especially in the algorithm development stage, a large amount of parameter adjustment and large-scale code writing are required. During training and deployment, due to the operations on and off the cloud, it may cause problems. For large-scale data upload, download or code copy, it is not suitable for online editing and immediate operation.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

Therefore, Qingyun provides an algorithm development platform in terms of algorithm development. It can launch an online development environment based on cloud services, completely build Python projects and VC projects, and use project files and engineering environments online to conduct code research and development.

During the development process, if there is any need for debugging, it can be expanded immediately; if training is needed, the job task can be immediately assigned to the training cluster; if inference is needed, it can be placed on the inference cluster.

At the same time, during the algorithm development process, there may be some forms of joint development or mixed development. Qingyun also provides code warehouses and mirror warehouses for model management. Different personnel use different permissions to carry out unified algorithm development and service integration. .

In a nutshell, Qingyun mainly provides computing products and scheduling products for all development scenarios for algorithm developers, ensuring that the entire algorithm development business can be effectively operated on the cloud and reducing large-scale upload and download operations.

7. AI training platform

If the algorithm development is nearing completion or needs debugging, a large amount of computing power infrastructure needs to be activated for development and training. Based on the infrastructure, Qingyun provides an AI training platform to empower users.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

After the GPU resources, storage resources and network resources are constructed, users can build independently through the cloud platform and achieve one-click operation. Qingyun AI training platform mainly builds clusters online based on its own GPU resources. After the construction is completed, a certain storage will be mounted by default, and users can choose by themselves.

The Qingyun AI training platform will also have a built-in online development environment. Some commonly used training frameworks will also be built into the development environment. It will provide users with full scenarios and full application environments through clusters, allowing users to distribute online across multiple machines. style training.

8. Container Inference Service Platform After the large model training is almost completed, the Qingyun container inference service platform can play a role when providing inference services to the public.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

Through the Qingyun container inference service platform, after users deploy the inference service, they can then use the configured load balancing and automatic scaling to ensure that user visits can be called immediately. At the same time, Qingyun provides online monitoring services to customers. If there is a problem with the inference service, users can immediately monitor what went wrong with the container inference, and Qingyun can solve it online. For concurrent operations and large-scale call operations, Qingyun can also perform load balancing and automatic scaling, greatly reducing manual configuration operations.

9. Model warehouse (MaaS)

Qingyun Model Warehouse (MaaS) is mainly aimed at AI computing power service customers and general computing customers. Model service providers can put products on the application market and model market according to their own model needs, making it convenient for customers of various enterprises to call, One-click fine-tuning and one-click deployment.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

3: Stimulate diverse values ​​and accelerate the implementation of scenarios

In general, the purpose of Qingyun AI computing power scheduling platform is to manage AI infrastructure like local resources, which is mainly reflected in five major aspects:

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

1. Provide unified scheduling of multiple computing power

Faced with GPU resources, CPU resources, domestic chips, application frameworks, applications and user business scenarios, Qingyun uses a unified platform for scheduling and management, including storage facilities and network facilities.

2. Realize intelligent computing power scheduling based on infrastructure

In terms of computing power scheduling priority and affinity, based on VM, host and container, users can realize intelligent computing power scheduling and configuration, as well as management services through Qingyun's platform.

3. Quick and effective adaptation to domestic chips. Qingyun can effectively and quickly adapt to domestic chips, ensuring that localized algorithm services and localized codes can run immediately on domestic chips.

4. Visualization service

In terms of intelligent operation and maintenance for the management side, Qingyun's monitoring and alarm services provide customers and administrators with visual operations through a large operation and maintenance platform.

5. Rich application market

Qingyun Technology actively builds an ecosystem and creates a rich application market, so that applications and customers from all walks of life can get the computing resources and business resources they want on the Qingyun AI computing platform.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

At present, Qingyun AI computing power scheduling platform has been implemented in Jinan supercomputing applications, and Sunward Cloud has been online to provide operational services. Based on Jinan's tens of thousands of supercomputing hardware infrastructure, various computing networks, servers, etc., Qingyun provides listing, management, and scheduling services. It provides services for different computer rooms, supercomputing businesses, intelligent computing businesses, GPUs, and various storage and network-based services. information, conduct unified management, integration, management and distribution, and provide computing power scheduling products and computing power cloud service products to customers from all walks of life.

Qingyun AI Computing Cloud Service

Qingyun AI computing power cloud service products are also launched on Qingyun public cloud to provide services, mainly for large model training scenarios.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

For cards with relatively high priority and high configuration, Qingyun provides public cloud computing service products. In the AI ​​scenario, Qingyun builds distributed GPU computing clusters with underlying resources and binds them to the public network environment, allowing user access.

Users can upload data to parallel file storage based on this, or they can integrate parallel file storage and GPU computing clusters into the same network to ensure data security and cloud service security through private networks. . You can also run your business through online training and remote SSH access to distributed computing clusters and parallel file storage.

Detailed explanation of Qingyun Technologys launch of AI computing power products and services to address computing power challenges

In terms of business, users can use AI computing clusters and container inference services, and their infrastructure is A800 resources, bare metal servers, and virtualized servers. All Qingyun AI computing power cloud service products use high-speed interconnected networks and adopt the online environment, development environment, training and reasoning environment required by the AI ​​computing power industry. Everyone is welcome to apply for registration and trial.

The above is the detailed content of Detailed explanation of Qingyun Technology's launch of AI computing power products and services to address computing power challenges. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The demand for AI computing power is increasing by leaps and bounds, promoting iterative updates of optical module products, and the demand for 1.6T broadband may flourish. The demand for AI computing power is increasing by leaps and bounds, promoting iterative updates of optical module products, and the demand for 1.6T broadband may flourish. Aug 16, 2023 pm 06:13 PM

As the demand for artificial intelligence surges, optical module products will continue to be updated to adapt to the development trend of high speed and new technologies. Zhongji InnoLight (300308.SZ), as a leader in the optical module industry, recently announced its important AI customers The demand for 1.6T optical modules has been proposed to meet future GPU needs with higher bandwidth and computing power. The company is expected to achieve mass production of 1.6T optical modules in 2025. The demand for AI computing power has exploded, promoting the development of the optical module industry. With the rise of industries such as artificial intelligence, big data, cloud computing, 5G and the Internet of Things, optical modules have become a One of the core components in optical fiber communication systems, its functional requirements in realizing photoelectric signal conversion are also growing rapidly. With the continuous expansion of the telecommunications market and data centers, optical communications

Detailed explanation of Qingyun Technology's launch of AI computing power products and services to address computing power challenges Detailed explanation of Qingyun Technology's launch of AI computing power products and services to address computing power challenges Oct 16, 2023 pm 08:37 PM

At the Qingyun Technology AI Computing Power Conference, Miao Hui, product manager, introduced in detail the Qingyun AI computing power scheduling platform and Qingyun AI computing power cloud services. The following is the full text of the speech: Artificial Intelligence users face computing power challenges. With the explosion of the artificial intelligence industry, AIGC, large models, scientific research computing, enterprise-level big data and artificial intelligence have put forward higher demands on computing power centers. Especially in the face of data centers with a single computing power, it is no longer able to meet the growing demand for computing power in all walks of life. Therefore, more intelligent computing centers, supercomputing centers and general cloud computing services are needed to provide computing power services to the whole society. However, users of the AI ​​industry, AI infrastructure, and AI computing power also face a series of challenges: the bottleneck of unified management of multiple resources. Facing users with multiple computing power, multiple storage, and the entire computing network

The demand for AI computing power has increased sharply, and Shanghai Lingang will build a tens-billion-scale computing power industry The demand for AI computing power has increased sharply, and Shanghai Lingang will build a tens-billion-scale computing power industry Jun 03, 2023 pm 05:50 PM

(Editor/Lu Dong) The integration and innovation of large models and large computing power is causing a major shift in the production paradigm, pushing scientific research and industrial applications towards the era of general artificial intelligence (AGI) driven by intelligent computing. In the early stages of rapid technology iteration, how to build a new generation of infrastructure, lower the application threshold, shorten the research and development cycle, and improve innovation efficiency has become a new challenge that the government and upstream and downstream industries need to jointly solve. "Currently, when artificial intelligence companies come to Lingang, they do not simply focus on support policies such as subsidies, but ask whether they can solve the computing power needs." On June 2, at the Intelligent Computing Conference of Shanghai Lingang New Area, Lingang New Film Lu Yu, director of the high-tech department of the district management committee, said. On the same day, the "Action Plan to Accelerate the Construction of Computing Industry Ecosystem in Lingang New Area" (hereinafter referred to as: Plan

Building AI-oriented infrastructure, Lenovo anchors the 'main channel' of AI computing power Building AI-oriented infrastructure, Lenovo anchors the 'main channel' of AI computing power Aug 24, 2023 am 09:05 AM

There is no doubt that artificial intelligence is not a simple technological revolution for human society. It symbolizes the arrival of an era, just like the industrial era to the agricultural era. It will bring about earth-shaking changes and affect human society for hundreds of years, even for centuries. The process of thousands of years. There is no doubt about the importance of AI computing power in promoting artificial intelligence applications, especially with the explosion of the entire generative AI market this year, which has brought huge demand for AI computing power. In order to ensure the further rapid development of artificial intelligence applications, AI computing power can be said to have become a crucial factor. Against this background, at the 2023 China Computing Power (Infrastructure) Conference held on August 18, Lenovo responded to the current demand for AI computing power and officially released the Lenovo AI computing power strategy.

Hou Bin of Great Wall Securities: Explore investment opportunities in the AI ​​computing industry chain and understand the impact of demand catalysis Hou Bin of Great Wall Securities: Explore investment opportunities in the AI ​​computing industry chain and understand the impact of demand catalysis Nov 11, 2023 am 08:37 AM

At the recent Wind3C conference, Hou Bin, the head of the Great Wall Securities Industrial Finance Research Institute and TMT Research Center, shared a speech titled "AI Computing Power and Satellite Internet". Hou Bin believed that computing power is the productivity of the development of artificial intelligence. With the rapid development of large artificial intelligence models and the ever-expanding demand for computing power, there has been a mismatch between computing power demand and chip capabilities, which will promote the development of the artificial intelligence computing power industry. As a core product with the highest degree of industrialization and the most cutting-edge technology reserves in China, optical modules will see rapid growth due to the continuous upgrade of computing power driven by the development of large-scale artificial intelligence models. The satellite Internet industry is still in its infancy, and the medium- and long-term development potential in the future It is huge and has broad development prospects. The following is the core point shared by Hou Bin: People

Microsoft takes action again, with news that it will invest billions of dollars in mining transformation companies to provide computing power for AI Microsoft takes action again, with news that it will invest billions of dollars in mining transformation companies to provide computing power for AI Jun 04, 2023 am 10:10 AM

After officially investing in OpenAI, Microsoft's related software announced one by one that it would be equipped with ChatGPT. With the increase in users, OpenAI and many users have discovered that if they want better feedback and learning effects of the robot system, they need to add more machines. The ability to learn means that the overall computing power needs to be improved. But just today, during an interview with OpenAI CEO in Europe, he discussed OpenAI's API and product plans, which attracted the attention of many netizens. What attracted the attention of many friends was his discussion on the progress of AI. According to him, in addition to normal AI iterations, GPT-3 is also in OpenAI’s open source plan, but currently GPUs can no longer keep up. And according to previous introduction, GPT-

To cope with computing power challenges, Amazon Cloud Technology focuses on AI infrastructure construction To cope with computing power challenges, Amazon Cloud Technology focuses on AI infrastructure construction Jul 16, 2023 pm 02:21 PM

Coming to the 2023 Amazon Cloud Technology China Summit on the second day, there were still many surprises. The activities and theme discussions on this day cannot escape one theme: AI. Focusing on the continued hot trend of AI, Amazon Cloud Technology has made a lot of deployments and will come up with more products and technologies. Among them, Chen Xiaojian, general manager of Amazon Cloud Technology Greater China, mentioned that Amazon Cloud Technology focuses on helping customers get rid of the constraints of infrastructure and focus on technological innovation. In order to cope with the high computing power requirements of the AI ​​era, Amazon Cloud Technology will develop products and technologies such as self-developed chips, elastic computing storage combinations, and Serverless architecture to provide more cost-effective services. Chen Xiaojian said that Amazon Cloud Technology has comprehensive and in-depth infrastructure construction capabilities, and cooperates with Intel,

See all articles