How artificial intelligence can make hardware develop better
Computer hardware has been an inactive market for many years. The dominant x86 microprocessor architecture has reached the limits of performance gains that can be achieved through miniaturization, so manufacturers are primarily focused on packing more cores into a chip.
For the rapid development of machine learning and deep learning, GPU is the savior. Originally designed for graphics processing, GPUs can have thousands of small cores, making them ideal for the parallel processing capabilities required for AI training.
The essence of artificial intelligence is that it benefits from parallel processing, and about 10 years ago it was discovered that GPUs, which are designed to display pixels on a screen, are well suited for this because they are parallel processing engines that can Put in a lot of cores.
That’s good news for Nvidia, which saw its market capitalization surge from less than $18 billion in 2015 to $735 billion before the market contracted last year. Until recently, the company had virtually the entire market to itself. But many competitors are trying to change that.
In terms of artificial intelligence workloads, it has been mainly Nvidia’s GPUs so far, but users are looking for technologies that can take it to the next level. As high-performance computing and AI workloads continue to converge, we We will see a wider variety of accelerators emerge.
Accelerating the development of new hardware
The big chip manufacturers are not standing still. Three years ago, Intel acquired Israeli chipmaker Havana Labs and made the company the focus of its artificial intelligence development efforts.
The Gaudi2 training optimization processor and Greco inference processor launched by Havana last spring are said to be at least twice as fast as Nvidia’s flagship processor A100.
In March this year, Nvidia launched its H100 accelerator GPU with 80 billion transistors and support for the company's high-speed NVLink interconnect. It features a dedicated engine that can accelerate the execution of Transformer-based models used in natural language processing by six times compared to the previous generation. Recent tests using the MLPerf benchmark show that H100 outperforms Gaudi2 in most deep learning tests. Nvidia is also seen as having an advantage in its software stack.
Many users choose GPUs because they have access to an ecosystem of centralized software. The reason why NVIDIA is so successful is because they have established an ecosystem strategy.
Hyperscale cloud computing companies are entering the field even before chipmakers. Google LLC’s Tensor processing unit is an application-specific integrated circuit that was launched in 2016 and is currently in its fourth generation. Amazon Web Services launched its inference processing accelerator for machine learning in 2018, claiming it offers more than twice the performance of GPU-accelerated instances.
Last month, the company announced the general availability of cloud instances based on its Trainium chips, saying that in deep learning model training scenarios, with comparable performance, their cost ratio based on GPU's EC2 is 50% lower. The efforts of both companies are mainly focused on delivery through cloud services.
While established market leaders focus on incremental improvements, many of the more interesting innovations are taking place among startups building AI-specific hardware. Venture capitalists attracted the majority of the $1.8 billion invested in chip startups last year, more than double the amount in 2017, according to the data.
They are chasing a market that could bring huge gains. The global artificial intelligence chip market is expected to grow from US$8 billion in 2020 to nearly US$195 billion by 2030.
Smaller, Faster, Cheaper
Few startups want to replace x86 CPUs, but that’s because of the leverage to do so Relatively small. Chips are no longer the bottleneck, communication between different chips is a huge bottleneck.
The CPU performs low-level operations such as managing files and assigning tasks, but a purely CPU-specific approach is no longer suitable for extensions. The CPU is designed for everything from opening files to managing memory caches. Activities must be universal. This means that it is not well suited for the massively parallel matrix arithmetic operations required for AI model training.
Most activity in the market revolves around coprocessor accelerators, application-specific integrated circuits, and, to a lesser extent, field-programmable gate arrays that can be fine-tuned for specific uses.
Everyone is following Google's line of developing co-processors that work in conjunction with the CPU to target algorithms by hard-coding them into the processor rather than running them as software. Specific parts of the AI workload.
Acceleration equation
The acceleration equation is used to develop so-called graphics stream processors for edge computing scenarios such as self-driving cars and video surveillance. The fully programmable chipset takes on many of the functions of a CPU but is optimized for task-level parallelism and streaming execution processing, using only 7 watts of power.
The architecture is based on a graph data structure, where relationships between objects are represented as connected nodes and edges. Each machine learning framework uses graph concepts, maintaining the same semantics throughout the chip's design. The entire graph including the CMM but containing custom nodes can be executed. We can speed up anything parallel in these graphs.
Its graphics-based architecture solves some of the capacity limitations of GPUs and CPUs and can more flexibly adapt to different types of AI tasks. It also allows developers to move more processing to the edge for better inference. If companies can pre-process 80% of the processing, they can save a lot of time and costs.
These applications can bring intelligence closer to data and enable rapid decision-making. The goal of most is inference, which is the field deployment of AI models, rather than the more computationally intensive training tasks.
A company is developing a chip that uses in-memory computing to reduce latency and the need for external storage devices. Its artificial intelligence platform will provide flexibility and the ability to run multiple neural networks while maintaining high accuracy.
Its data processing unit series is a massive parallel processor array with a scalable 80-core processor that can execute dozens of tasks in parallel. The key innovation is the tight integration of a tensor coprocessor inside each processing element and support for direct tensor data exchange between elements to avoid memory bandwidth bottlenecks. This enables efficient AI application acceleration because pre- and post-processing are performed on the same processing elements.
Some companies focus on inferring deep learning models using thumbnail-sized chipsets, which the company claims can perform 26 trillion operations per second while consuming less power. to 3 watts. In part, it is achieved by breaking down each network layer used to train a deep learning model into the required computing elements and integrating them on a chip specifically built for deep learning.
The use of onboard memory further reduces overhead. The entire network is inside the chip and there is no external memory, which means the chip can be smaller and consume less energy. The chip can run deep learning models on near-real-time high-definition images, enabling a single device to run automatic license plate recognition on four lanes simultaneously.
Current Development of Hardware
Some startups are taking more of a moonshot approach, aiming to redefine AI model training and the entire platform it runs on.
For example, an AI processor optimized for machine learning can manage up to 3.5 million per second with nearly 9,000 concurrent threads and 900 megabytes of in-processor memory. billion processing operations. The integrated computing system is called the Bow-2000IPU machine and is said to be capable of 1.4 petaflops of operations per second.
What makes it different is its three-dimensional stacked chip design, which enables it to package nearly 1,500 parallel processing cores in a single chip. All of these businesses are capable of running completely different businesses. This differs from widely used GPU architectures, which prefer to run the same operations on large blocks of data.
As another example, some companies are solving the problem of interconnection, that is, the wiring between connecting components in integrated circuits. As processors reach their theoretical maximum speeds, the path to move the bits becomes increasingly a bottleneck, especially when multiple processors access memory simultaneously. Today's chips are no longer the bottleneck of the interconnect.
The chip uses nanophotonic waveguides in an artificial intelligence platform that it says combines high speed and large bandwidth in a low-energy package. It is essentially an optical communications layer that can connect multiple other processors and accelerators.
The quality of AI results comes from the ability to simultaneously support very large and complex models while achieving very high throughput responses, both of which are achievable. This applies to anything that can be done using linear algebra, including most applications of artificial intelligence.
Expectations for its integrated hardware and software platform are extremely high. Enterprises have seized on this point, such as R&D platforms that can run artificial intelligence and other data-intensive applications anywhere from the data center to the edge.
The hardware platform uses custom 7nm chips designed for machine and deep learning. Its reconfigurable dataflow architecture runs an AI-optimized software stack, and its hardware architecture is designed to minimize memory accesses, thereby reducing interconnect bottlenecks.
The processor can be reconfigured to adapt to AI or high-performance computing HPC workloads. The processor is designed to handle large-scale matrix operations at a higher performance level, which is ideal for A plus for clients with changing workloads.
Although CPUs, GPUs and even FPGAs are well suited for deterministic software such as transactional systems and ERP, machine learning algorithms are probabilistic, meaning the results are not known in advance. This requires a completely different hardware infrastructure.
The platform minimizes interconnect issues by connecting 1TB of high-speed double data rate synchronous memory to the processor, essentially masking it with 20x faster on-chip memory The latency of the DDR controller, so this is transparent to the user, allows us to train higher parameter count language models and the highest resolution images without tiling or downsampling.
Tiling is a technique used for image analysis that reduces the need for computing power by splitting an image into smaller chunks, analyzing each chunk, and then recombining them. need. Downsampling trains a model on a random subset of the training data to save time and computing resources. The result is a system that is not only faster than GPU-based systems, but also capable of solving larger problems.
Summary
With many businesses seeking solutions to the same problems, a shakeout is inevitable, but no one Expect this shakeout to come soon. GPUs will be around for a long time and will probably remain the most cost-effective solution for AI training and inference projects that don’t require extreme performance.
Nevertheless, as models at the high end of the market become larger and more complex, there is an increasing need for functionally specific architectures. Three to five years from now, we will likely see a proliferation of GPUs and AI accelerators, which is the only way we can scale to meet demand at the end of this decade and beyond.
Leading chipmakers are expected to continue doing what they do well and gradually build on existing technologies. Many companies will also follow Intel's lead and acquire startups focused on artificial intelligence. The high-performance computing community is also focusing on the potential of artificial intelligence to help solve classic problems such as large-scale simulations and climate modeling.
The high-performance computing ecosystem is always looking for new technologies they can absorb to stay ahead of the curve, and they are exploring what artificial intelligence can bring to the table. Lurking behind the scenes is quantum computing, a technology that remains more theoretical than practical but has the potential to revolutionize computing.
Regardless of which new architecture gains traction, the surge in artificial intelligence has undoubtedly reignited interest in the potential for hardware innovation to open up new frontiers in software.
The above is the detailed content of How artificial intelligence can make hardware develop better. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



The CentOS shutdown command is shutdown, and the syntax is shutdown [Options] Time [Information]. Options include: -h Stop the system immediately; -P Turn off the power after shutdown; -r restart; -t Waiting time. Times can be specified as immediate (now), minutes ( minutes), or a specific time (hh:mm). Added information can be displayed in system messages.

Complete Guide to Checking HDFS Configuration in CentOS Systems This article will guide you how to effectively check the configuration and running status of HDFS on CentOS systems. The following steps will help you fully understand the setup and operation of HDFS. Verify Hadoop environment variable: First, make sure the Hadoop environment variable is set correctly. In the terminal, execute the following command to verify that Hadoop is installed and configured correctly: hadoopversion Check HDFS configuration file: The core configuration file of HDFS is located in the /etc/hadoop/conf/ directory, where core-site.xml and hdfs-site.xml are crucial. use

Backup and Recovery Policy of GitLab under CentOS System In order to ensure data security and recoverability, GitLab on CentOS provides a variety of backup methods. This article will introduce several common backup methods, configuration parameters and recovery processes in detail to help you establish a complete GitLab backup and recovery strategy. 1. Manual backup Use the gitlab-rakegitlab:backup:create command to execute manual backup. This command backs up key information such as GitLab repository, database, users, user groups, keys, and permissions. The default backup file is stored in the /var/opt/gitlab/backups directory. You can modify /etc/gitlab

Installing MySQL on CentOS involves the following steps: Adding the appropriate MySQL yum source. Execute the yum install mysql-server command to install the MySQL server. Use the mysql_secure_installation command to make security settings, such as setting the root user password. Customize the MySQL configuration file as needed. Tune MySQL parameters and optimize databases for performance.

Enable PyTorch GPU acceleration on CentOS system requires the installation of CUDA, cuDNN and GPU versions of PyTorch. The following steps will guide you through the process: CUDA and cuDNN installation determine CUDA version compatibility: Use the nvidia-smi command to view the CUDA version supported by your NVIDIA graphics card. For example, your MX450 graphics card may support CUDA11.1 or higher. Download and install CUDAToolkit: Visit the official website of NVIDIACUDAToolkit and download and install the corresponding version according to the highest CUDA version supported by your graphics card. Install cuDNN library:

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

When installing and configuring GitLab on a CentOS system, the choice of database is crucial. GitLab is compatible with multiple databases, but PostgreSQL and MySQL (or MariaDB) are most commonly used. This article analyzes database selection factors and provides detailed installation and configuration steps. Database Selection Guide When choosing a database, you need to consider the following factors: PostgreSQL: GitLab's default database is powerful, has high scalability, supports complex queries and transaction processing, and is suitable for large application scenarios. MySQL/MariaDB: a popular relational database widely used in Web applications, with stable and reliable performance. MongoDB:NoSQL database, specializes in

A complete guide to viewing GitLab logs under CentOS system This article will guide you how to view various GitLab logs in CentOS system, including main logs, exception logs, and other related logs. Please note that the log file path may vary depending on the GitLab version and installation method. If the following path does not exist, please check the GitLab installation directory and configuration files. 1. View the main GitLab log Use the following command to view the main log file of the GitLabRails application: Command: sudocat/var/log/gitlab/gitlab-rails/production.log This command will display product
