Table of Contents

The evolution of the relationship between data and computing power and the derived challenges

The breakthrough path of distributed storage unified integrated data base

The future picture of HPDA AI in the era of large models

Home

In the era of large AI models, new data storage bases promote the digital intelligence transition of education, scientific research

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 21, 2023 pm 09:53 PM

ai large model data storage Educational Research Institute Mathematics

Generative AI (AIGC) has opened a new era of generalization of artificial intelligence. The competition around large models is spectacular. Computing infrastructure is the primary focus of competition, and the awakening of power has increasingly become an industry consensus. .

In the era of large AI models, new data storage bases promote the digital intelligence transition of education, scientific research

In the new era, large models are moving from single modality to multi-modality, the size of parameters and training data sets is growing exponentially, and massive unstructured data requires the support of high-performance mixed load capabilities; at the same time , data-intensive paradigms have become popular, and application scenarios such as supercomputing and high-performance computing (HPC) are moving in depth, and existing data storage bases are no longer able to meet the needs of continuous upgrades.

If computing power, algorithms, and data are the "troika" driving the development of artificial intelligence, then in the context of huge changes in the external environment, the three urgently need to regain a dynamic balance. The improvement of "soft power" brought about by the improvement of algorithm models and the enhancement of "hard power" caused by the optimization of computing power supply need further support - the "capacity" of data transmission and the "storage capacity" of data storage need to be improved. As a power source, new data storage bases will emerge from the cocoon and become a butterfly in the process of meeting many challenges.

Application scenarios with complex and continuously evolving requirements are the best touchstone for new data storage bases. In this sense, the teaching and scientific research industry is a typical representative: computing power and data are key elements of digital transformation in this field, and scientific research computing with disciplinary integration is equally important as data-based decision support. Moving from HPC to HPDA (High Performance Data Analysis) is a big step to improve the efficiency of teaching and scientific research, and the empowerment of AI can help solve problems that were impossible, inaccurate, and impractical to calculate in the past.

In the era of large AI models, new data storage bases promote the digital intelligence transition of education, scientific research

At the 2023 World Artificial Intelligence Conference held recently, Huawei's OceanStor Pacific distributed storage assisted Shanghai Jiao Tong University in building an HPC AI storage base that was officially launched. The "Turn it over" unified data base will expand by another 25PB this year. It is expected to become a new benchmark for the digital and intelligent transformation of teaching and scientific research, and also set a milestone for the journey of exploring new bases for data storage.

The evolution of the relationship between data and computing power and the derived challenges

With the digital transformation of thousands of industries entering the deep water zone, and the coordinated explosion of emerging technologies such as artificial intelligence and big data, the relationship between data and computing power is undergoing subtle changes.

In the era of large AI models, new data storage bases promote the digital intelligence transition of education, scientific research

The field of education and scientific research is at the forefront of the digital economy and is quite sensitive to this change. In the past, data had to follow computing power. In order to cope with the rapid numerical solution of complex scientific and engineering problems, the education and scientific research community has paid more attention to how to build the most powerful computing power for a long time, while data is only considered as a supporting facility for computing power.

Nowadays, "computing power revolves around data" has gradually become a new trend. The emergence of emerging applications, the expansion of data volume, and the highlighting of data security issues have placed greater emphasis on the value of data itself. Based on breakthroughs in AI, big data and other technologies, traditional supercomputing is evolving into data-intensive supercomputing, and multiple heterogeneous computing power needs to be built around the same data storage base.

Lin Xinhua, deputy director of the Network Information Center of Shanghai Jiao Tong University, believes that the reversal of dominance in data and computing power not only provides an opportunity to build a data-intensive supercomputing platform, but also brings many new innovations to the construction of a unified data storage base. challenges.

First of all, the explosive growth of data has significantly increased the demand for storage capacity. According to statistics, the data scale of the "Jiaowosuan" platform has grown at an annual rate of 7PB. The data volume of application scenarios such as meteorology and oceanography, energy exploration, satellite remote sensing, gene sequencing, cryo-electron microscopy, AI autonomous driving, manufacturing CAE, and animation rendering have all reached Petabyte level, it is not easy to use a data infrastructure to accommodate such a huge amount of data.

Secondly, new businesses are constantly emerging and require higher storage performance. The acceleration of the AI generalization process, especially the batch output of large models and multi-modalities, poses a severe challenge to IO performance. With hundreds of terabytes of data sets becoming the norm, natural language processing and multi-modal applications have accelerated the growth of data volume, and efficient access to small file training data sets requires storage performance to reach a new level.

Thirdly, multi-cluster storage is shared across campuses, and the flow of data between heterogeneous clusters may cause problems such as data loss and slow operation. The "Jiaowosuan" platform provides a variety of heterogeneous computing power, including ARM clusters, X86 clusters, and AI clusters. Among many clusters, only by achieving full data flow and data integration can the maximum value of computing power and data be released.

Finally, traditional AI local disk training, along with high concurrent data analysis, is imminent to break the IO wall. The IO bottleneck in the process of multiple data migrations is very prominent - the traditional reading and writing process is lengthy, loading data involves three data migrations, and checkpoint also involves two data migrations. The efficiency loss caused during this process cannot be ignored.

The breakthrough path of distributed storage unified integrated data base

In order to cope with the above challenges, Shanghai Jiao Tong University and Huawei Storage have launched in-depth cooperation since 2019 to jointly build a "hand over to me" data-intensive supercomputing platform. Relying on its profound accumulation in technology and application innovation, Huawei's OceanStor Pacific distributed storage products help "Tuowo Calculation" build a unified data base to support various heterogeneous computing power platforms across the school.

In the era of large AI models, new data storage bases promote the digital intelligence transition of education, scientific research

Building a distributed unified integrated data base is the only way for "Leave It to Me" to embrace emerging data applications. Based on a horizontally scalable distributed storage architecture, the storage capacity and bandwidth of the "Jiaowosuan" platform can be expanded on demand. The first is linear growth in performance capacity, with a single cluster reaching EB-level capacity; the second is the use of high-density and large-capacity hardware to save cabinet space; the third is the use of large proportions of EC to improve disk utilization with scenario-based compression.

It is understood that the "Jiaowosuan" platform will increase from the initial 2PB capacity and 6GB/s bandwidth to 20PB capacity and 60GB/s bandwidth in 2020, and then expand to 40PB capacity and 120GB/s bandwidth in 2022. It is expected that In 2023, the capacity will be expanded by another 25PB. At the same time, Huawei's OceanStor Pacific distributed storage has an ultra-high-density design of 5U and 120 disk slots. Combined with a large-scale EC data redundancy protection algorithm, it can increase the hard disk space utilization to 91.6% while meeting high reliability.

Distributed all-flash hardware support is the cornerstone of "leave it to me" to solve storage performance problems. With the help of Huawei OceanStor Pacific, the "Turn it over" platform uses all-flash hardware acceleration to significantly improve bandwidth and IOPS performance. Each node has 800,000 IOPS and a bandwidth of 20GB/S, which can meet high-performance requirements under mixed load conditions.

Unified management of global distributed storage across campuses is a good way to solve the problem of multi-cluster storage sharing. By using the global file system to manage multiple sets of storage across domains, the "Jiaowosuan" platform builds a unified data base across campuses. With the support of Huawei's OceanStor Pacific distributed storage products, it achieves global file views, data management and Scheduling, global data flow, unified streaming metadata and other multiple goals.

Data analysis acceleration, multi-protocol access lossless interoperability, and high efficiency without relocation are the powerful tools for "Leave It to Me" to break the IO wall. Based on Huawei's AI-oriented storage solution and Huawei's OceanStor Pacific distributed storage capability of "one data, access through multiple protocols", the "Turn it over" platform realizes external storage to reduce data relocation, greatly improves analysis efficiency, and saves storage. space.

The future picture of HPDA AI in the era of large models

Through the "Jiaowosuan" platform and the evolution trajectory of working with Huawei Storage to create a new base for distributed, unified and integrated data, it is not difficult to see that data-intensive scenarios are accelerating their evolution.

From the early HPC to the later HPDA, and then to the flying wings of HPDA AI, application scenarios in the teaching and scientific research industry have continued to enrich, and the demand for storage products and data bases has also continued to rise. In fact, teaching and scientific research are just the tip of the iceberg in the digitalization process of thousands of industries, and the era of data storage is coming.

The arrival of the big model era will further reshape IT infrastructure, including storage, and storage products with new AI genes are expected to become the new favorite in the digital upgrade of the industry. On July 14, Huawei's AI storage new product launch conference in the large model era, with the theme of "New Data Paradigm Unleashing New Momentum of AI," will be held online. Whether you are deploying AI in your enterprise or developing applications with AI capabilities, the solutions released this time will provide better technical architecture and products to help you keep pace with the times.

The generalization of artificial intelligence has begun. The leader in the storage industry has taken the lead in blowing the clarion call. Every movement that follows is worth looking forward to.

The above is the detailed content of In the era of large AI models, new data storage bases promote the digital intelligence transition of education, scientific research. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7518

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

How to build an AI-oriented data governance system? Apr 12, 2024 pm 02:31 PM

In recent years, with the emergence of new technology models, the polishing of the value of application scenarios in various industries and the improvement of product effects due to the accumulation of massive data, artificial intelligence applications have radiated from fields such as consumption and the Internet to traditional industries such as manufacturing, energy, and electricity. The maturity of artificial intelligence technology and application in enterprises in various industries in the main links of economic production activities such as design, procurement, production, management, and sales is constantly improving, accelerating the implementation and coverage of artificial intelligence in all links, and gradually integrating it with the main business , in order to improve industrial status or optimize operating efficiency, and further expand its own advantages. The large-scale implementation of innovative applications of artificial intelligence technology has promoted the vigorous development of the big data intelligence market, and also injected market vitality into the underlying data governance services. With big data, cloud computing and computing

Why can't localstorage successfully save data? Jan 03, 2024 pm 01:41 PM

Why does storing data to localstorage always fail? Need specific code examples In front-end development, we often need to store data on the browser side to improve user experience and facilitate subsequent data access. Localstorage is a technology provided by HTML5 for client-side data storage. It provides a simple way to store data and maintain data persistence after the page is refreshed or closed. However, when we use localstorage for data storage, sometimes

How to implement image storage and processing functions of data in MongoDB Sep 22, 2023 am 10:30 AM

Overview of how to implement image storage and processing functions of data in MongoDB: In the development of modern data applications, image processing and storage is a common requirement. MongoDB, a popular NoSQL database, provides features and tools that enable developers to implement image storage and processing on its platform. This article will introduce how to implement image storage and processing functions of data in MongoDB, and provide specific code examples. Image storage: In MongoDB, you can use GridFS

How to implement polymorphic storage and multidimensional query of data in MySQL? Jul 31, 2023 pm 09:12 PM

How to implement polymorphic storage and multidimensional query of data in MySQL? In practical application development, polymorphic storage and multidimensional query of data are a very common requirement. As a commonly used relational database management system, MySQL provides a variety of ways to implement polymorphic storage and multidimensional queries. This article will introduce the method of using MySQL to implement polymorphic storage and multi-dimensional query of data, and provide corresponding code examples to help readers quickly understand and use it. 1. Polymorphic storage Polymorphic storage refers to the technology of storing different types of data in the same field.

Interaction between Redis and Golang: How to achieve fast data storage and retrieval Jul 30, 2023 pm 05:18 PM

Interaction between Redis and Golang: How to achieve fast data storage and retrieval Introduction: With the rapid development of the Internet, data storage and retrieval have become important needs in various application fields. In this context, Redis has become an important data storage middleware, and Golang has become the choice of more and more developers because of its efficient performance and simplicity of use. This article will introduce readers to how to interact with Golang through Redis to achieve fast data storage and retrieval. 1.Re

How to use C++ for efficient data compression and data storage? Aug 25, 2023 am 10:24 AM

How to use C++ for efficient data compression and data storage? Introduction: As the amount of data increases, data compression and data storage become increasingly important. In C++, there are many ways to achieve efficient data compression and storage. This article will introduce some common data compression algorithms and data storage technologies in C++, and provide corresponding code examples. 1. Data compression algorithm 1.1 Compression algorithm based on Huffman coding Huffman coding is a data compression algorithm based on variable length coding. It does this by pairing characters with higher frequency

Yii framework middleware: providing multiple data storage support for applications Jul 28, 2023 pm 12:43 PM

Yii framework middleware: providing multiple data storage support for applications Introduction Middleware (middleware) is an important concept in the Yii framework, which provides multiple data storage support for applications. Middleware acts like a filter, inserting custom code between an application's requests and responses. Through middleware, we can process, verify, filter requests, and then pass the processed results to the next middleware or final handler. Middleware in the Yii framework is very easy to use

What type of file is a dat file? Feb 19, 2024 am 11:32 AM

The dat file is a universal data file format that can be used to store various types of data. dat files can contain different data forms such as text, images, audio, and video. It is widely used in many different applications and operating systems. dat files are typically binary files that store data in bytes rather than text. This means that dat files cannot be modified or their contents viewed directly through a text editor. Instead, specific software or tools are required to process and parse the data of dat files. d

See all articles