Table of Contents
Data Center "Sandwich"
Software
Dojo Architecture Overview
Summary" >Summary
Home Technology peripherals AI Tesla Dojo supercomputing architecture details disclosed for the first time! 'Fucked to pieces' for autonomous driving

Tesla Dojo supercomputing architecture details disclosed for the first time! 'Fucked to pieces' for autonomous driving

Apr 11, 2023 pm 09:46 PM
chip Tesla

To meet the growing demand for artificial intelligence and machine learning models, Tesla created its own artificial intelligence technology to teach Tesla cars to drive themselves.

Recently, Tesla disclosed a large number of details about the Dojo supercomputing architecture at the Hot Chips 34 conference.

Essentially, Dojo is a giant composable supercomputer built from a completely custom architecture covering computation, networking, input/output (I/O) chip to instruction set architecture (ISA), power delivery, packaging and cooling. All of this is done to run custom, specific machine learning training algorithms at scale.

Ganesh Venkataramanan is Tesla’s senior director of autonomous driving hardware and is responsible for the Dojo project and AMD’s CPU design team. At the Hot Chips 34 conference, he and a group of chip, system and software engineers unveiled many of the machine's architectural features for the first time.

Data Center "Sandwich"

" Generally speaking, our process of manufacturing chips is to put them on the package and put the package on the printed circuit board , and then it goes into the system. The system goes into the rack," Venkataramanan said.

But there’s a problem with this process: every time data moves from the chip to the package and off the package, there’s latency and bandwidth loss.

To get around these limitations, Venkataramanan and his team decided to start from scratch.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Thus, Dojo’s training tiles were born.

This is a self-contained computing cluster that takes up half a cubic foot and is capable of 556TFLOPS of FP32 performance in a 15kW liquid-cooled package.

Each tile is equipped with 11GB of SRAM and is connected via a 9TB/s fabric using a custom transport protocol throughout the stack.

Venkataramanan said: "This training board represents an unmatched level of integration from computer to memory, to power delivery, to communications, without the need for any additional switches."

The core of the training tile is Tesla’s D1, a 50 billion transistor chip based on TSMC’s 7nm process. Tesla says each D1 is capable of achieving 22TFLOPS of FP32 performance at a TDP of 400W.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Tesla then used 25 D1s, divided them into known good molds, and then used TSMC's on-wafer system technology Wrap them up to enable massive computing integration with extremely low latency and extremely high bandwidth.

However, the system design and vertical stacking architecture on the chip bring challenges to power delivery.

According to Venkataramanan, most current accelerators place the power supply directly next to the silicon wafer. He explained that this approach, while effective, meant that a large portion of the accelerator had to be dedicated to these components, which was impractical for Dojo. Therefore, Tesla chose to provide power directly through the bottom of the chip.

In addition, Tesla has also developed the Dojo Interface Processor (DIP), which is the bridge between the host CPU and the training processor.

Each DIP has 32GB of HBM, and up to five of these cards can be connected to a training tile at 900GB/s for a total of 4.5TB/s amount, each tile has a total of 160GB HBM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Tesla’s V1 configuration pairs these tiles – or 150 D1 dies – in an array to support four host CPUs , equipped with five DIP cards per host CPU to achieve an exaflop of claimed BF16 or CFP8 performance.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Software

Such a specialized computing architecture requires a specialized software stack. However, Venkataramanan and his team recognized that programmability would determine Dojo's success or failure.

"When we design these systems, ease of programmability by software peers is paramount. Researchers don't wait for your software folks to write a handwritten kernel to accommodate the new algorithms we want to run. "

In order to do this, Tesla gave up the idea of ​​using the kernel and designed Dojo's architecture around the compiler.

"What we do is we use PiTorch. We create a middle layer that helps us parallelize to scale the hardware underneath it. Underneath everything is compiled code. "In order to create a software stack that can adapt to any future workload, this is the only way.

Despite emphasizing the flexibility of the software, Venkataramanan pointed out that the platform currently running in their lab is currently limited to Tesla.

Dojo Architecture Overview

After reading the above, let us take a deeper look at the Dojo architecture.

Tesla has an exascale artificial intelligence system for machine learning. Tesla has enough capital to hire employees and build chips and systems specifically for its applications, just like Tesla's in-car systems.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Tesla is not only building its own AI chip, but also a supercomputer.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Distributed system analysis

Each node of Dojo has Own CPU, memory and communication interfaces.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo node

This is the processing pipeline of the Dojo processor.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Processing Pipeline

Each node has 1.25MB of SRAM. In AI training and inference chips, a common technique is to co-locate memory with computation to minimize data transfers, which are very expensive from a power and performance perspective.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Node memory

Then each node is connected to a 2D grid.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Network Interface

This is an overview of the data path.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Data Path

Here is an example of what the chip can do list parsing.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

List parsing

More about the instruction set here , is a Tesla original, rather than a typical Intel, Arm, NVIDIA or AMD CPU/GPU instruction set.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Instruction set

In artificial intelligence, arithmetic format is very important, especially what the chip supports Format. Using DOJO, Tesla can study common formats such as FP32, FP16, and BFP16. These are common industry formats.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Arithmetic Format

Tesla is also working on configurable FP8 or CFP8. It comes in 4/3 and 5/2 range options. This is similar to the NVIDIA H100 Hopper configuration of FP8. We also see the Untether.AI Boqueria 1458 RISC-V core AI accelerator focusing on different FP8 types.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Arithmetic Format 2

Dojo also has a different CFP16 format, to achieve higher accuracy and support FP32, BFP16, CFP8 and CFP16.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Arithmetic Format 3

These cores are then integrated into the fabricated in the mold. Tesla's D1 chip is manufactured by TSMC using a 7nm process. Each chip has 354 Dojo processing nodes and 440MB of SRAM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

First Integration Box D1 Mold

These D1 chips are packaged in On a dojo training tile. The D1 chips are tested and then assembled into a 5×5 tile. These tiles have 4.5TB/s bandwidth per edge. They also have a power delivery envelope of 15kW per module, or roughly 600W per D1 chip after subtracting the power used by the 40 I/O dies. The comparison shows why something like Lightmatter Passage would be more attractive if a company didn't want to design such a thing.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Secondary integration box Dojo training tile

Dojo interface The processor is located at the edge of the 2D grid. Each training block has 11GB of SRAM and 160GB of shared DRAM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system topology

The following is the 2D network connecting the processing nodes Grid bandwidth data.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system communication logic two-dimensional grid

Each DIP Provides a 32GB/s link to the host system.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

##Dojo system communication PCIe link DIP and host

Tesla also has Z-plane links for longer routes. In the rest of the speech, Tesla talked about system-level innovation.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Communication mechanism

This is the delay boundary of die and tiles, That's why they are handled differently in Dojo. The reason Z-plane links are needed is that long paths are expensive.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system communication mechanism

Any processing node can cross the system Access data. Each node can push or pull data to SRAM or DRAM.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Dojo system batch communication

Dojo uses a flat addressing scheme communication.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 1

These chips can be bypassed in software Wrong processing node.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 2

This means that the software must understand the system topology .

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 3

Dojo does not guarantee end-to-end traffic ordering , so packets need to be counted at the destination.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System Network 4

Here's how packets are counted into the system part of synchronization.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System synchronization

The compiler needs to define a Tree

. Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

System synchronization 2

Tesla said that one exa-pod has more than 1 million CPU (or compute node). These are large systems.

Tesla Dojo supercomputing architecture details disclosed for the first time! Fucked to pieces for autonomous driving

Summary

Tesla built the Dojo specifically to work at scale. Typically, startups look to build one or a few AI chips per system. Clearly, Tesla is focused on greater scale.

In many ways, it makes sense for Tesla to have a huge AI training ground. What's even more exciting is that it's not only using commercially available systems, but it's also building its own chips and systems. Some ISAs on the scalar side are borrowed from RISC-V, but the vector side and many of the architectures Tesla has customized, so this requires a lot of work.

The above is the detailed content of Tesla Dojo supercomputing architecture details disclosed for the first time! 'Fucked to pieces' for autonomous driving. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Tesla finally takes action! Will self-driving taxis be unveiled soon? ! Tesla finally takes action! Will self-driving taxis be unveiled soon? ! Apr 08, 2024 pm 05:49 PM

According to news on April 8, Tesla CEO Elon Musk recently revealed that Tesla is committed to developing self-driving car technology. The highly anticipated unmanned self-driving taxi Robotaxi will be launched on August 8. Official debut. The data editor learned that Musk's statement on Previously, Reuters reported that Tesla’s plan to drive cars would focus on the production of Robotaxi. However, Musk refuted this, accusing Reuters of having canceled plans to develop low-cost cars and once again publishing false reports, while making it clear that low-cost cars Model 2 and Robotax

Tesla Dojo supercomputing debut, Musk: The computing power of AI training by the end of the year will be approximately equal to 8,000 NVIDIA H100 GPUs Tesla Dojo supercomputing debut, Musk: The computing power of AI training by the end of the year will be approximately equal to 8,000 NVIDIA H100 GPUs Jul 24, 2024 am 10:38 AM

According to news from this website on July 24, Tesla CEO Elon Musk (Elon Musk) stated in today’s earnings conference call that the company is about to complete the largest artificial intelligence training cluster to date, which will be equipped with 2 Thousands of NVIDIA H100 GPUs. Musk also told investors on the company's earnings call that Tesla would work on developing its Dojo supercomputer because GPUs from Nvidia are expensive. This site translated part of Musk's speech as follows: The road to competing with NVIDIA through Dojo is difficult, but I think we have no choice. We are now over-reliant on NVIDIA. From NVIDIA's perspective, they will inevitably increase the price of GPUs to a level that the market can bear, but

Tesla Cybertruck soars off a cliff to challenge, a power show beyond the limits! Tesla Cybertruck soars off a cliff to challenge, a power show beyond the limits! Mar 07, 2024 pm 09:28 PM

According to news on March 7, a video of Tesla Cybertruck challenging the "Road to Hell's Revenge" in the small town of Utah, USA, was recently exposed on the Internet. Cliff is located in Salt Lake City, Utah, USA, and is a popular place for outdoor enthusiasts. There are more than 30 off-road roads here, and the rugged and steep rock walls attract many extreme off-road enthusiasts to challenge. The video shows that when Tesla Cybertruck challenged a V-shaped ravine close to 45 degrees, it relied on the strong power of its three motors to steadily climb the slope and finally successfully reached the top. During the climb, the Cybertruck performed well without any slippage, despite the slippery rock surface. According to the editor’s understanding, Tesla Cybertruck models are divided into single-motor rear

Tesla's new Model 3 high-performance version passed Korean certification, and the power parameters were exposed and attracted attention Tesla's new Model 3 high-performance version passed Korean certification, and the power parameters were exposed and attracted attention Mar 06, 2024 pm 08:49 PM

According to news on March 6, the media recently revealed that Tesla’s new Model 3 high-performance version has passed relevant Korean certifications and disclosed a series of eye-catching power parameters. It is reported that this new car will be equipped with an advanced dual-motor system, including a front-mounted 3D3 induction asynchronous motor and a rear-mounted 4D2 permanent magnet synchronous motor. The two work together to output amazing power. Specifically, the front motor provides 215 horsepower, while the rear motor provides up to 412 horsepower, bringing the total power of the vehicle to an astonishing 461kW. The rear 4D2 motor alone has a power of approximately 303kW. The motor design of this new car can reach peak power when the speed reaches 110km/h, which makes the new Model 3 perform better when driving at high speeds.

Tesla's FSD technology amazes Germany, and autonomous driving is promising in the future Tesla's FSD technology amazes Germany, and autonomous driving is promising in the future Apr 29, 2024 pm 01:20 PM

According to news on April 29, Tesla recently publicly demonstrated its highly anticipated fully autonomous driving FSD technology for the first time in Germany, marking the official entry of FSD technology into the European market. During the demonstration event, Rikard Fredriksson, senior adviser to the Swedish Ministry of Transport, had the opportunity to experience it. He took a Tesla Model Y and personally experienced the convenience of FSD (fully autonomous driving). Performance on German roads. Fredriksson has held product safety-related positions in Apple's automotive projects and has an in-depth understanding of autonomous driving technology. After experiencing Fredriksson's driving assistance system, the FSD+12's driving is smooth and natural. He also specifically mentioned that when traveling from Munich city center to the airport

Tesla's 2024 Q1 financial report announced: revenue declines, low-priced model production on the agenda Tesla's 2024 Q1 financial report announced: revenue declines, low-priced model production on the agenda Apr 24, 2024 pm 06:16 PM

According to news on April 24, Tesla disclosed its financial report for the first quarter of 2024 today. Reports show that Tesla achieved revenue of US$21.301 billion during the quarter, a 9% decrease compared to the same period last year. The figure was slightly lower than the $22.3 billion forecast by market analysts. At the same time, the company's net profit was US$1.129 billion, a sharp decline of 55% year-on-year. Tesla has had huge success in vehicle sales. In the first quarter, 386,800 vehicles were delivered globally, significantly lower than the market’s previous expectations of approximately 430,000 vehicles. Compared with the same period last year, delivery volume fell by 8.3%, and compared with the previous quarter, it fell sharply by 20.1%. This is Tesla's first year-on-year decline in deliveries since 2020. In order to slow down

Tesla's Shanghai Energy Storage Gigafactory will be put into trial operation within the year, with an estimated energy storage scale of nearly 40GWh Tesla's Shanghai Energy Storage Gigafactory will be put into trial operation within the year, with an estimated energy storage scale of nearly 40GWh Mar 22, 2024 pm 12:32 PM

The construction of Shanghai's Future Industry Pilot Zone is celebrating its "first anniversary". At a press briefing hosted by Lu Yu, director of the High-Tech Division of the Lingang New Area Management Committee, important information about the much-anticipated Tesla energy storage project was revealed. . Lu Yu said that the project plans to complete trial production within this year, and the production scale is expected to be close to 40GWh. This news has attracted widespread attention. Tesla's energy storage project is of great significance to local industry development and renewable energy utilization. As a world-renowned electric vehicle manufacturer, Tesla’s participation in the energy storage field has attracted much attention. By cooperating with Shanghai, Tesla's construction of energy storage projects in the Lingang New Area will help improve the local industrial level and technological innovation capabilities. Lu Yu also further introduced Lingang’s other progress in the field of new energy. he mentioned

Tesla Cybertruck mass production shows positive signs, aerial photography of Texas factory reveals grand production situation Tesla Cybertruck mass production shows positive signs, aerial photography of Texas factory reveals grand production situation Mar 15, 2024 pm 12:04 PM

The production progress of Tesla's latest pure electric pickup truck, Cybertruck, has been attracting much attention. Although Tesla has kept a low profile, a recent aerial video of the Texas Gigafactory taken by a Tesla observer named Jeff Roberts seems to reveal positive signs of Cybertruck production to the outside world. According to the video, there are already more than 300 Cybertrucks in the Tesla Gigafactory in Texas, spread across various areas. The large number this time is the largest number of Cybertrucks ever found at the factory, indicating that the model's mass production is rapidly developing. This shows that Tesla has made significant progress in the production of Cybertruck. Although Tesla has previously expressed concern about Cybertr

See all articles