With the wide application of deep learning models in fields such as natural language processing, model inference speed and performance have become important issues. Recently, the research result led by Kuaishou "SAMP: Post-training Quantitative Model Inference Library Based on Adaptive Mixed Precision" was successfully selected into the top conference EMNLP 2023 and displayed and shared in Singapore
This study proposes an inference acceleration tool called SAMP, which uses adaptive mixed precision technology to significantly increase the inference speed while maintaining model performance. It contains an adaptive mixed-precision encoder and a series of advanced fusion strategies. The adaptive mixed-precision encoder can find the best floating-point and fixed-point mixed precision combination in a large number of general matrix multiplication (GEMM) operations and Transformer layers, so that the performance of model inference is closest to user needs (computation accuracy or inference efficiency). Ultimately, mixed-precision calculations achieve better computational accuracy than full fixed-point calculations. The fusion strategy integrates and improves embedding operators and quantization-related calculation operations, reducing CUDA kernel calls by half. At the same time, SAMP is an end-to-end toolkit implemented in the C programming language. It has excellent inference speed and also lowers the industrial application threshold for quantitative inference after training.
What needs to be rewritten is: the innovation point of SAMP compared with similar systems, as shown in Table 1
SAMP has the following main highlights:
1. Adaptive. SAMP balances computational accuracy and latency performance in a post-training quantized inference approach. Users can choose mixed-precision configurations with appropriate accuracy and inference latency for different tasks. SAMP can also recommend the best quantization combination mode to users through adaptive allocation methods.
2. Reasoning efficiency. SAMP shows better inference speedup than other inference toolkits over a wide precision range (floating point to fixed point). In the Chinese Language Understanding Evaluation Benchmark (CLUE) classification task data set, SAMP achieved an acceleration of up to 1.05-1.15 times compared with FasterTransformer.
3. Flexibility. SAMP covers numerous downstream tasks such as classification, sequence labeling, text matching, etc. Target modules are extensible and can be flexibly customized. It is user-friendly and less platform-dependent. SAMP supports C and Python APIs and only requires CUDA 11.0 or higher. In addition, SAMP also provides many model conversion tools to support mutual conversion between models in different formats.
Picture 1: This research paper will be presented and shared at the EMNLP2023 conference
The main researcher, Tian Rong from Kuaishou, said that the result of the joint efforts of the entire team is to achieve good results in scenarios such as model inference. SAMP has made contributions in three aspects: first, it solves the problem of large accuracy loss in existing post-quantization (PTQ) reasoning tools in industrial applications; second, it promotes the use of post-quantization (PTQ) technology in multiple downstream tasks of NLP. Large-scale application; at the same time, the inference library is also lightweight, flexible, user-friendly, and supports user-defined task goals
It is reported that EMNLP (Empirical Methods in Natural Language Processing) is one of the top international conferences in the field of natural language processing and artificial intelligence. It focuses on the academic research of natural language processing technology in various application scenarios, with special emphasis on the empirical evidence of natural language processing. Research. This conference has promoted core innovations in the field of natural language processing such as pre-training language models, text mining, dialogue systems, and machine translation. It has a huge influence in both academic and industrial circles. This selection also means that Kuaishou’s progress in this field The research results have been recognized by international scholars.
The above is the detailed content of Kuaishou's research result SAMP was recognized at the EMNLP2023 International Artificial Intelligence Conference. For more information, please follow other related articles on the PHP Chinese website!