Home > Technology peripherals > AI > Easy and Efficient Transformer (NetEase's ultra-large model online inference engine)

Easy and Efficient Transformer (NetEase's ultra-large model online inference engine)

王林
Release: 2024-01-24 10:45:05
forward
458 people have browsed it

Easy and Efficient Transformer(网易超大模型线上推理引擎)

NetEase’s open source inference acceleration framework for transformer-based models supports single-card high-performance inference of tens of billions of models on the mid- to low-end Ampere architecture.

Project Background

Transformer-based large-scale models have proven effective in a variety of tasks in many fields. However, applying it to industrial production requires considerable effort to reduce the inference cost. To fill this gap, we propose a scalable inference solution: Easy and Efficient Transformer (EET). EET is a system that includes a series of Transformer reasoning optimizations at the algorithm and implementation levels. By optimizing the calculation and data processes of Transformer, EET can significantly reduce the cost of inference and improve the efficiency and performance of the model. Our experimental results show that EET can significantly improve inference speed and resource utilization without losing model accuracy, providing a simple and effective solution for large-scale model applications in industrial production.

First, we design a highly optimized kernel for long inputs and large hidden sizes.

In addition, we also propose a flexible CUDA memory manager to reduce the memory footprint when deploying large models. Compared with the state-of-the-art Transformer inference library (Faster Transformer v4.0), EET is able to achieve an average 1.40-4.20x decoding layer acceleration on the A100 GPU.

Paper address

https://arxiv.org/abs/2104.12470

Github address

https://github.com/NetEase-FuXi /EET

The above is the detailed content of Easy and Efficient Transformer (NetEase's ultra-large model online inference engine). For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:163.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template