

CRAM: A New Chip Design That Could Reduce the Power Consumption of AI Protocols by Orders of Magnitude
Artificial Intelligence (AI) continues to power the 4th industrial revolution, alongside its energy demands. Today, anyone can access advanced AI tools
Artificial Intelligence (AI) continues to power the 4th industrial revolution, alongside its energy demands. Today, anyone can access advanced AI tools and integrate them into their systems to improve efficiency and reduce workload. The energy required to power these algorithms increases as the demand for AI applications increases. As such, environmentalists are already pointing out sustainability concerns surrounding the tech. Thankfully, a team of researchers has created a highly efficient alternative. Here's what you need to know.
Growing AI Energy Demands Creating an Energy Crisis
New AI systems continue to launch at an increasing frequency. The most recent global energy use forecast predicts that AI energy consumption will double from 460 terawatt-hours (TWh) in 2022 to 1,000 TWh by 2026. These protocols include recommenders, large language models (LLMs), image and video processing and creation, Web3 services, and more.
According to the researcher's study, AI systems require data transference that equates to “200 times the energy used for computation when reading three 64-bit source operands from and writing one 64-bit destination operand to an off-chip main memory.” As such, reducing energy consumption for artificial intelligence (AI) computing applications is a prime concern for developers who will need to overcome this roadblock to achieve large-scale adoption and mature the tech.
Thankfully, a group of innovative engineers from the University of Minnesota have stepped up with a possible solution that could reduce the power consumption of AI protocols by orders of magnitude. To accomplish this task, the researchers introduce a new chip design that improves on the Von Neumann Architecture found in most chips today.
Von Neumann Architecture
John von Neumann revolutionized the computer sector in 1945 when he separated logic and memory units, enabling more efficient computing at the time. In this arrangement, the logic and data are stored in different physical locations. His invention improved performance because it allowed both to be accessed simultaneously.
Today, most computers still use the Von Neuman structure with your HD storing your programs and the random access memory (RAM) housing programming instructions and temporary data. Today's RAM accomplishes this task using various methods including DRAM, which leverages capacitors, and SRAM, which has multiple circuits.
Notably, this structure worked great for decades. However, the constant transfer of data between logic and memory requires lots of energy. This energy transfer increases as data requirements and computational load increase. As such, it creates a performance bottleneck that limits efficiency as computing power increases.
Attempted Improvements on Energy Demands
Over the years, many attempts have been made to improve Von Neumann's architecture. These attempts have created different variations of the memory process with the goal of bringing the two actions closer physically. Currently, the three main variations include.
Near-memory Processing
This upgrade moves logic physically closer to memory. This was accomplished using a 3D-stacked infrastructure. Moving the logic closer reduced the distance and energy needed to transfer the data required to power computations. This architecture provided improved efficiency.
In-memory Computing
Another current method of improving computational architecture is in-memory computing. Notably, there are two variations of this style of chip. The original integrates clusters of logic next to the memory on a single chip. This deployment enables the elimination of transistors used in predecessors. However, there are many who consider this method not “true” to the in-memory structure because it still has separate memory locations, which means that initial performance issues that resulted from the data transfer exist, albeit on a smaller scale.
True In-memory
The final type of chip architecture is “true in-memory.” To qualify as this type of architecture, the memory needs to perform computations directly. This structure enhances capabilities and performance because the data for logic operations remains in its location. The researcher's latest version of true in-memory architecture is CRAM.
(CRAM)
Computational random-access memory (CRAM) enables true in-memory computations as the data is processed within the same array. The researchers modified a standard 1T1M STT-MRAM architecture to make CRAM possible. The CRAM layout integrates micro transistors into each cell and builds on the magnetic tunnel junction-based CPUs.
This approach provides better control and performance. The team then stacked an additional transistor, logic line (LL), and logic bit line (LBL) in each cell, enabling real-time computation within the same memory bank.
History of CRAM
Today's AI systems require a new structure that can meet their computational demands without diminishing sustainability concerns. Recognizing this demand, engineers decided to delve deep into CRAM capabilities for the first time. Their results were published in the NPJ scientific journal under the report “Experimental demonstration of magnetic tunnel junction-based computational random-access memory.”
The first CRAM leveraged an MTJ device structure. These spintronic devices improved on previous storage methods by using electron spin rather than transistors to transfer and store
The above is the detailed content of CRAM: A New Chip Design That Could Reduce the Power Consumption of AI Protocols by Orders of Magnitude. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











DMA in C refers to DirectMemoryAccess, a direct memory access technology, allowing hardware devices to directly transmit data to memory without CPU intervention. 1) DMA operation is highly dependent on hardware devices and drivers, and the implementation method varies from system to system. 2) Direct access to memory may bring security risks, and the correctness and security of the code must be ensured. 3) DMA can improve performance, but improper use may lead to degradation of system performance. Through practice and learning, we can master the skills of using DMA and maximize its effectiveness in scenarios such as high-speed data transmission and real-time signal processing.

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

Handling high DPI display in C can be achieved through the following steps: 1) Understand DPI and scaling, use the operating system API to obtain DPI information and adjust the graphics output; 2) Handle cross-platform compatibility, use cross-platform graphics libraries such as SDL or Qt; 3) Perform performance optimization, improve performance through cache, hardware acceleration, and dynamic adjustment of the details level; 4) Solve common problems, such as blurred text and interface elements are too small, and solve by correctly applying DPI scaling.

C performs well in real-time operating system (RTOS) programming, providing efficient execution efficiency and precise time management. 1) C Meet the needs of RTOS through direct operation of hardware resources and efficient memory management. 2) Using object-oriented features, C can design a flexible task scheduling system. 3) C supports efficient interrupt processing, but dynamic memory allocation and exception processing must be avoided to ensure real-time. 4) Template programming and inline functions help in performance optimization. 5) In practical applications, C can be used to implement an efficient logging system.

The built-in quantization tools on the exchange include: 1. Binance: Provides Binance Futures quantitative module, low handling fees, and supports AI-assisted transactions. 2. OKX (Ouyi): Supports multi-account management and intelligent order routing, and provides institutional-level risk control. The independent quantitative strategy platforms include: 3. 3Commas: drag-and-drop strategy generator, suitable for multi-platform hedging arbitrage. 4. Quadency: Professional-level algorithm strategy library, supporting customized risk thresholds. 5. Pionex: Built-in 16 preset strategy, low transaction fee. Vertical domain tools include: 6. Cryptohopper: cloud-based quantitative platform, supporting 150 technical indicators. 7. Bitsgap:

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

The main steps and precautions for using string streams in C are as follows: 1. Create an output string stream and convert data, such as converting integers into strings. 2. Apply to serialization of complex data structures, such as converting vector into strings. 3. Pay attention to performance issues and avoid frequent use of string streams when processing large amounts of data. You can consider using the append method of std::string. 4. Pay attention to memory management and avoid frequent creation and destruction of string stream objects. You can reuse or use std::stringstream.

Efficient methods for batch inserting data in MySQL include: 1. Using INSERTINTO...VALUES syntax, 2. Using LOADDATAINFILE command, 3. Using transaction processing, 4. Adjust batch size, 5. Disable indexing, 6. Using INSERTIGNORE or INSERT...ONDUPLICATEKEYUPDATE, these methods can significantly improve database operation efficiency.