Teach you how to shear 'alpaca' step by step, Chen Danqi's team proposed the LLM-Shearing large model pruning method-AI-php.cn

Home

Technology peripherals

Teach you how to shear 'alpaca' step by step, Chen Danqi's team proposed the LLM-Shearing large model pruning method

PHPz

Oct 12, 2023 pm 06:29 PM

project Large model pruning llm-shearing

What will be the effect of cutting the alpaca hair of the large model of Llama 2? Today, Princeton University’s Chen Danqi team proposed a large model pruning method called LLM-Shearing, which can achieve better performance than models of the same size with a small amount of calculation and cost.

Since the emergence of large language models (LLMs), they have achieved remarkable results on various natural language tasks. However, large language models require massive computing resources to train. As a result, the industry is increasingly interested in building equally powerful mid-scale models, with the emergence of LLaMA, MPT, and Falcon, enabling efficient inference and fine-tuning.

These LLMs of varying sizes are suitable for different use cases, but training each individual model from scratch (even a small 1 billion parameter model) still requires a lot of computing resources , which is still a huge burden for most scientific research institutions.

Therefore, in this article, Chen Danqi’s team at Princeton University attempts to solve the following problem: Can existing pre-trained LLM be used to build a smaller, general and performance-effective Competitive LLM while requiring much less computation than training from scratch?

Researchers explore the use of structured pruning to achieve their goals. The problem here is that for general-purpose LLM, the pruned model will experience performance degradation, especially if there is no significant computational investment after pruning. The efficient pruning method they used can be used to develop smaller but still performance-competitive LLMs, and training requires significantly less computation than training from scratch.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Paper address: https://arxiv.org/abs/2310.06694
Code address: https://github.com/princeton-nlp/LLM-Shearing
ModelsSheared-LLaMA-1.3B, Sheared-LLaMA-2.7B

Before pruning LLM, researchers identified two key technical challenges. One is how to determine the final pruning structure with powerful performance and efficient reasoning? LLM's current structured pruning technology does not have a specified target structure, resulting in unsatisfactory performance and inference speed of the pruned model; second, how to continue pre-training the pruned model to achieve expected performance? They observed that training with raw pre-training data resulted in different loss reductions across domains compared to training the model from scratch.

In response to these two challenges, the researchers proposed the "LLM - shearing" algorithm. This novel pruning algorithm, called "directed structured pruning," prunes the source model to a specified target architecture, which is determined by the configuration of the existing pre-trained model. They show that the pruning method searches for substructures in the source model and maximizes performance under resource constraints. In addition, a dynamic batch loading algorithm is designed, which can load the training data of each domain in proportion according to the loss reduction rate, thereby efficiently utilizing the data and accelerating the overall performance improvement.

Finally, the researchers pruned the LLaMA2-7B model into two smaller LLMs, namely Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B , confirming the effectiveness of its method.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

They only used 50 billion tokens (i.e. 5% of the OpenLLaMA pre-training budget) to prune and continue pre-training, but for 11 representative downstream tasks (such as general knowledge, reading comprehension, and world knowledge) and open-ended generated instruction tuning, both models still outperform other popular LLMs of similar size, including Pythia, INCITE, and OpenLLaMA.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

But it should be mentioned that when this paper released Sheared-LLaMA-3B, the record of the strongest 3B open source model had been broken by StableLM-3B.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

In addition, downstream task performance trajectories indicate that using more tokens to further train the pruned model will bring greater benefits. The researchers only experimented with models up to 7 billion parameters, but LLM-shearing is highly general and can be extended to large language models of any size in future work.

Method introduction

Given an existing large model M_S (source model ), the goal of this article is to study how to effectively generate a smaller and stronger model M_T (target model). The study believes that this requires two stages to complete:

The first stage prunes M_S to M_T. Although this reduces the number of parameters, it Inevitably leads to performance degradation;
The second stage continues to pretrain M_T to make its performance stronger.

structured pruning

structured pruning A large number of parameters can be removed from the model, thereby compressing the model and accelerating inference. However, existing structured pruning methods can cause models to deviate from conventional architectural configurations. For example, the CoFiPruning method produces models with non-uniform layer configurations, which incurs additional inference overhead compared to standard unified layer configurations.

This article extends CoFiPruning to allow source models to be pruned to any target configuration specified. For example, this article uses the INCITE-Base-3B architecture as the target structure when generating the 2.7B model.

In addition, this article also learns a set of pruning masks on model parameters of different granularities. The mask variables are as follows:

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Each mask variable controls whether to prune or retain relevant substructures. For example, if the corresponding z^layer= 0, this layer needs to be deleted. Figure 2 below illustrates how pruning masks control which structures are pruned.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

After pruning, this paper finalizes the pruned architecture by retaining the highest scoring components associated with the mask variables in each substructure and continues using language construction. The model target is used to pre-train the pruned model.

Dynamic batch loading

This study believes that a large number of pruned models should be Pre-training is necessary to restore model performance.

Inspired by other research, this paper proposes a more efficient algorithm, dynamic batch loading, which can simply dynamically adjust the domain scale based on model performance. The algorithm is as follows:

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Experiments and results

Model configuration: This article uses LLaMA2-7B The model was used as the source model, and then a structured pruning experiment was performed. They compressed LLaMA2-7B into two smaller target sizes of 2.7 B and 1.3 B parameters, and compared the performance of the pruned model with models of the same size. Including OPT-1.3B, Pythia-1.4B, OPT-2.7B, Pythia-2.8B, INCITE-Base-3B, OpenLLaMA-3B-v1, OpenLLaMA-3B-v2. Table 8 summarizes the model architecture details for all these models.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Data: Since the training data of LLaMA2 is not publicly accessible, this article uses the RedPajama dataset. Table 1 provides the pre-training data used by this paper’s model and the baseline model.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Training: We used up to 16 Nvidia A100 GPUs (80GB) in all experiments.

SHEARED-LLAMA outperforms equivalently sized LM

This paper shows that Sheared- LLaMA significantly outperforms existing LLMs of similar size while using only a fraction of the computational budget to train these models from scratch.

Downstream tasks: Table 2 shows the zero-shot and few-shot performance of Sheared-LLaMA and existing pre-trained models of similar size on downstream tasks.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Instruction Tuning: As shown in Figure 3, the instruction-tuned Sheared-LLaMA achieves a higher winning rate compared to all other pre-trained models of the same scale.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Figure 4 shows that the INCITEBase-3B model starts out with much higher accuracy, but its performance levels off during the ongoing pre-training process.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Analysis

Finally, the researcher analyzed the advantages of this method.

Effectiveness of dynamic batch loading

Among them, the researchers studied the following three To analyze the effectiveness of dynamic batch loading, we analyze the impact of: (1) the final LM loss across domains, (2) the data usage of each domain throughout the training process, and (3) downstream task performance. The results are based on the Sheared-LaMA-1.3B algorithm.

Cross-domain loss difference. The purpose of dynamic batch loading is to balance the loss reduction rate of each domain so that the loss reaches the reference value in approximately the same time. The difference between the model loss (original batch loading and dynamic batch loading) and the reference loss is plotted in Figure 5. In contrast, dynamic batch loading reduces the loss evenly and the difference in loss across domains is also very similar, which shows that the data More efficient use.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Data usage. Table 3 compares RedPajama’s raw data proportions and dynamically loaded domain data usage (Figure 7 shows the changes in domain weights throughout the training process). Dynamic bulk loading increases the weight of the Book and C4 domains compared to other domains, indicating that these domains are more difficult to recover from the pruned model.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Downstream performance. As shown in Figure 6, the pruned model trained using dynamic batch loading achieved better downstream performance compared to the model trained on the original RedPajama distribution. This suggests that the more balanced loss reduction brought about by dynamic batch loading can improve downstream performance.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Comparison with other pruning methods

In addition, the researchers used LLM- The shearing method is compared with other pruning methods and validation perplexity is reported, which is a strong indicator of overall model capability.

Due to computational limitations, the following experiments control the total computational budget of all compared methods rather than running each method to the end.

As shown in Table 4, under the same sparsity, the inference throughput of the target pruning model in this article is higher than that of the non-uniform pruning CoFiPruning model, but the perplexity Slightly higher.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

Other analysis

Table 5 shows that when the total amount of tokens is controlled , increasing pruning overhead can continuously improve perplexity. However, since pruning is more expensive than continuous pre-training, the researchers allocate 0.4B tokens to pruning.

Teach you how to shear alpaca step by step, Chen Danqis team proposed the LLM-Shearing large model pruning method

For more research details, please refer to the original paper.

The above is the detailed content of Teach you how to shear 'alpaca' step by step, Chen Danqi's team proposed the LLM-Shearing large model pruning method. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7486

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

Axiomatic training allows LLM to learn causal reasoning: the 67 million parameter model is comparable to the trillion parameter level GPT-4 Jul 17, 2024 am 10:14 AM

Show the causal chain to LLM and it learns the axioms. AI is already helping mathematicians and scientists conduct research. For example, the famous mathematician Terence Tao has repeatedly shared his research and exploration experience with the help of AI tools such as GPT. For AI to compete in these fields, strong and reliable causal reasoning capabilities are essential. The research to be introduced in this article found that a Transformer model trained on the demonstration of the causal transitivity axiom on small graphs can generalize to the transitive axiom on large graphs. In other words, if the Transformer learns to perform simple causal reasoning, it may be used for more complex causal reasoning. The axiomatic training framework proposed by the team is a new paradigm for learning causal reasoning based on passive data, with only demonstrations

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

LLM is really not good for time series prediction. It doesn't even use its reasoning ability. Jul 15, 2024 pm 03:59 PM

Can language models really be used for time series prediction? According to Betteridge's Law of Headlines (any news headline ending with a question mark can be answered with "no"), the answer should be no. The fact seems to be true: such a powerful LLM cannot handle time series data well. Time series, that is, time series, as the name suggests, refers to a set of data point sequences arranged in the order of time. Time series analysis is critical in many areas, including disease spread prediction, retail analytics, healthcare, and finance. In the field of time series analysis, many researchers have recently been studying how to use large language models (LLM) to classify, predict, and detect anomalies in time series. These papers assume that language models that are good at handling sequential dependencies in text can also generalize to time series.

See all articles