News on May 19, according to foreign media Techcrunch, in an online event this morning, Facebook parent company Meta disclosed for the first time the progress of its self-developed AI chips, which can support its recently launched advertising design and Generative AI technology for creative tools.
△Meta CEO Zuckerberg shows off the first self-developed AI chip MTIA
Alexis Bjorlin, vice president of infrastructure at Meta, said: “Building our own [hardware] capabilities gives us control over every layer of the stack, from data center design to training framework. This level of Vertical integration can push the boundaries of artificial intelligence research on a large scale.”The first self-developed AI chip MTIA
Over the past decade or so, Meta has spent billions of dollars recruiting top data scientists and building new types of artificial intelligence, including now a discovery engine, moderation filters, and ad recommenders across its apps and services Powered by artificial intelligence. This company has been striving to turn its many ambitious AI research innovations into products, especially in the area of generative AI.
Since 2016, leading Internet companies have been actively developing cloud AI chips. Google has been designing and deploying self-developed AI chips called Tensor Processing Units (TPU) for training generative AI systems such as PaLM-2, Imagen, etc.; Amazon provides AWS customers with two self-developed AI chips, AWS Trainium and AWS Inferentia. chip for application. Microsoft is also rumored to be working with AMD to develop an AI chip called Athena.Previously, Meta ran its AI workloads primarily using a combination of third-party CPUs and custom chips designed to accelerate AI algorithms—CPUs tend to be less efficient than GPUs at handling such tasks. In order to turn the situation around, Meta developed its first-generation self-developed AI chip MTIA (MTIA v1) based on the 7nm process in 2020.
Meta calls the AI chip Meta Training and Inference Accelerator, or MTIA for short, and describes it as part of a “family” of AI chips that accelerate AI training and inference workloads. An MTIA is an ASIC, a chip that combines different circuits on a single substrate, allowing it to be programmed to perform one or more tasks in parallel.
“To achieve higher levels of efficiency and performance across our critical workloads, we needed a custom solution that was co-engineered with the model, software stack, and system hardware to make the various services more efficient for our users. A good experience," Bjorlin explained.According to the introduction, MTIA v1 is manufactured using a 7-nanometer process, and its internal 128MB memory can be expanded to up to 128GB. Meta said that MTIA can be specially used to handle work related to AI recommendation systems, helping users find the best post content and present it to users faster, and its computing performance and processing efficiency are better than CPU. In addition, in the benchmark test of Meta design, MTIA is also more efficient than GPU in processing "low complexity" and "medium complexity" AI models.
Meta said there is still some work to be done in the memory and network areas of MTIA chips, which will create bottlenecks as the size of AI models grows, requiring the workload to be distributed across multiple chips. Recently, Meta has acquired the AI network technology team of Oslo-based British chip unicorn Graphcore for this purpose. Currently, MTIA focuses more on inference capabilities than training capabilities for the Meta application family's "recommended workloads."
Meta emphasized that it will continue to improve MTIA, which has "significantly" improved the company's efficiency in terms of performance per watt when running recommended workloads - in turn allowing Meta to run "more enhanced" and "cutting-edge" artificial intelligence work load.
According to the plan, Meta will officially launch its self-developed MTIA chip in 2025.
Meta’s AI supercomputer RSC
According to reports, Meta originally planned to launch its self-developed custom AI chips on a large scale in 2022, but ultimately delayed it and instead ordered billions of dollars worth of Nvidia GPUs for its supercomputer Research SuperCluster (RSC). , which required a major redesign of its multiple data centers.
According to reports, RSC debuted in January 2022 and was assembled in partnership with Penguin Computing, Nvidia and Pure Storage, and has completed the second phase of expansion. Meta says it now contains a total of 2,000 Nvidia DGX A100 systems, equipped with 16,000 Nvidia A100 GPUs.Although, RSC’s computing power currently lags behind Microsoft and Google’s AI supercomputers. Google claims its AI-focused supercomputer is powered by 26,000 Nvidia H100 GPUs. Meta notes that the advantage of RSC is that it allows researchers to train models using actual examples from Meta’s production systems. Unlike the company's previous AI infrastructure, which leveraged open source and publicly available data sets, this infrastructure is now available.
RSC AI supercomputers are advancing AI research in multiple areas, including generative AI, pushing the boundaries of research. "This is really about the productivity of AI research," a Meta spokesperson said. We want to provide AI researchers with state-of-the-art infrastructure that enables them to develop models and provide them with a training platform to advance AI. ”
Meta claims that at its peak, RSC could reach nearly 5 exaflops of computing power, making it one of the fastest in the world.
Meta uses RSC for LLaMA training, where RSC refers to the acronym for "Large Scale Language Model Meta Artificial Intelligence". Meta says the largest LLaMA model was trained on 2,048 A100 GPUs and took 21 days.
"Building our own supercomputing capabilities gives us control over every layer of the stack; from data center design to training frameworks," a Meta spokesperson added: "RSC will help Meta's AI researchers build new and better AI models that learn from trillions of examples; work across hundreds of different languages; work together to seamlessly analyze text, images, and videos; develop new augmented reality tools; and more.”
In the future, Meta may introduce its self-developed AI chip MTIA into RSC to further improve its AI performance.
AI chip MSVP for video processing
In addition to MTIA, Meta is also developing another AI chip called Meta Scalable Video Processor (MSVP), which is mainly designed to meet the data processing needs of video on demand and live streaming media that continue to grow. Meta ultimately hopes to Most of the mature and stable audio and video content processing work is performed by MSVP.
In fact, Meta began to conceive of custom server-side video processing chips many years ago, and announced the launch of ASICs for video transcoding and inference work in 2019. This is the culmination of some of those efforts and a new push for competitive advantage. Especially in the field of live video streaming.
“On Facebook alone, people spend 50% of their time watching videos,” Meta technical directors Harikrishna Reddy and Yunqing Chen wrote in a blog post published on the morning of the 19th: “To serve the world Across various devices everywhere (mobile, laptop, TV, etc.), a video uploaded to Facebook or Instagram is transcoded into multiple bitstreams with different encoding formats, resolutions and qualities... MSVP is programmable and scalable and can be configured to efficiently support the high-quality transcoding required for VOD as well as the low latency and faster processing times required for live streaming.”
△MSVP
Meta says its plan is to eventually offload most "stable and mature" video processing workloads to MSVP and only use software video encoding for workloads that require specific customization and "significantly" higher quality. Meta says we will continue to improve video quality with MSVP using pre-processing methods such as intelligent noise reduction and image enhancement, as well as post-processing methods such as artifact removal and super-resolution.
“In the future, MSVP will enable us to support more of Meta’s most important use cases and needs, including short-form video – enabling efficient delivery of generative AI, AR/VR and other Metaverse content,” said Reddy and Chen .
Editor: Xinzhixun-Rurounijian
The above is the detailed content of Meta's self-developed AI chip progress: the first AI chip will be launched in 2025, as well as a video AI chip. For more information, please follow other related articles on the PHP Chinese website!