Home Technology peripherals AI ICML 2024 | Breaking away from the LoRA architecture, training parameters are greatly reduced, and a new type of Fourier fine-tuning is coming

ICML 2024 | Breaking away from the LoRA architecture, training parameters are greatly reduced, and a new type of Fourier fine-tuning is coming

Jun 10, 2024 pm 05:58 PM
project

ICML 2024 | 脱离LoRA架构,训练参数大幅减少,新型傅立叶微调来了
The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

##This article introduces
Hong Kong University of Science and Technology (Guangzhou) 's article "Parameter-Efficient Fine-Tuning with Discrete Fourier Transform" about LLM PEFT Fine-tuning was accepted by ICML 2024 and the code has been open source.

ICML 2024 | 脱离LoRA架构,训练参数大幅减少,新型傅立叶微调来了

##Paper address: https://arxiv.org/abs/2405.03003
  • Project address: https://github.com/Chaos96/fourierft
##Background
Large-scale base models have achieved remarkable achievements in the fields of natural language processing (NLP) and computer vision (CV). Fine-tuning large-scale base models to make them more suitable for special downstream tasks has become a popular research topic. However, as models become larger and larger and downstream tasks become more and more diverse, the computing and storage consumption caused by fine-tuning the entire model is no longer acceptable. LoRA adopts a low-rank fitting fine-tuning increment scheme and successfully reduces a large amount of such consumption, but the size of each adapter is still not negligible. This inspires the core question of this article:
Compared with LoRA, how to further significantly reduce the trainable parameters? In addition, an interesting additional question is whether a high-rank incremental matrix can be obtained with fewer parameters.
Method

Fourier basis is widely used in various data compression applications, e.g. Compression of one-dimensional vector signals and two-dimensional images. In these applications, dense spatial domain signals are converted into sparse frequency domain signals through Fourier transform. Based on this principle, the author speculates that the increment of model weight can also be regarded as a spatial domain signal, and its corresponding frequency domain signal can be realized through sparse representation.

Based on this assumption, the authors propose a new method for learning incremental weight signals in the frequency domain. Specifically, this method represents spatial domain weight increments through sparse frequency domain signals at random locations. When loading the pre-trained model, n points are first randomly selected as valid frequency domain signals, and then these signals are spliced ​​into a one-dimensional vector. During the forward propagation process, this one-dimensional vector is used to restore the spatial matrix through Fourier transform; during the back propagation process, due to the differentiability of the Fourier transform, this learnable vector can be updated directly. This method not only effectively reduces the number of parameters required for model fine-tuning, but also ensures fine-tuning performance. In this way, the authors not only achieved efficient fine-tuning of large-scale basic models, but also demonstrated the potential application value of Fourier transform in the field of machine learning.

Thanks to the high information content of the Fourier transform basis, only a small n value is needed to achieve performance comparable to or even exceeding LoRA. In general, the trainable parameters of Fourier fine-tuning are only one-thousandth to one-tenth of those of LoRA.

ICML 2024 | 脱离LoRA架构,训练参数大幅减少,新型傅立叶微调来了

Experiment

##1. Natural language understanding

The authors evaluated the Fourier fine-tuning method on the GLUE benchmark for natural language understanding. Baseline comparison methods include Full Finetuning (FF), Bitfit, Adapter Tuning, LoRA, DyLoRA and AdaLoRA. The following table shows the performance of various methods on various GLUE tasks and the amount of training parameters required. The results show that Fourier fine-tuning reaches or even exceeds the performance of other fine-tuning methods with the smallest number of parameters.

2. Fine-tuning of natural language instructions

Natural language generation of large models is currently an important application field of model fine-tuning. The authors evaluate the performance of Fourier fine-tuning on the LLaMA family of models, the MT-Bench task, and the Vicuna task. The results show that Fourier fine-tuning achieves similar effects to LoRA with a very low amount of training parameters, further verifying the versatility and effectiveness of the Fourier fine-tuning method.

ICML 2024 | 脱离LoRA架构,训练参数大幅减少,新型傅立叶微调来了

3. Image classification

The author tested Fourier on Vision Transformer Fine-tuned performance covering 8 common image classification datasets. Experimental results show that although the improvement in compression rate of Fourier fine-tuning compared to LoRA in image classification tasks is not more significant than that in natural language tasks, it still surpasses the effect of LoRA with a much smaller number of parameters than LoRA. This further demonstrates the effectiveness and advantages of Fourier trimming in different application areas.

ICML 2024 | 脱离LoRA架构,训练参数大幅减少,新型傅立叶微调来了

4. Breaking through low rank

On the RTE data set of the GLUE benchmark, FourierFT can achieve significantly higher increments of rank than LoRA (usually 4 or 8).

ICML 2024 | 脱离LoRA架构,训练参数大幅减少,新型傅立叶微调来了

5.GPU resource consumption

During the fine-tuning process, FourierFT can achieve better performance than LoRA Less GPU consumption. The figure below shows the peak memory consumption on the RoBERTa-Large model using a single 4090 graphics card.

ICML 2024 | 脱离LoRA架构,训练参数大幅减少,新型傅立叶微调来了

Conclusion

The author introduced an efficient fine-tuning method called Fourier fine-tuning , by utilizing Fourier transform to reduce the number of trainable parameters when fine-tuning a large base model. This method significantly reduces storage and computing requirements by learning a small number of Fourier spectrum coefficients to represent weight changes. Experimental results show that Fourier fine-tuning performs well on tasks such as natural language understanding, natural language generation, instruction tuning, and image classification. Compared with existing low-rank adaptation methods (such as LoRA), Fourier fine-tuning maintains or exceeds the performance of LoRA. At the same time, the required trainable parameters are greatly reduced.

The above is the detailed content of ICML 2024 | Breaking away from the LoRA architecture, training parameters are greatly reduced, and a new type of Fourier fine-tuning is coming. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1267
29
C# Tutorial
1239
24
The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI ​​model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source Jul 17, 2024 am 02:46 AM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com. Introduction In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the basic model for many downstream tasks, current MLLM consists of the well-known Transformer network, which

LLM is really not good for time series prediction. It doesn't even use its reasoning ability. LLM is really not good for time series prediction. It doesn't even use its reasoning ability. Jul 15, 2024 pm 03:59 PM

Can language models really be used for time series prediction? According to Betteridge's Law of Headlines (any news headline ending with a question mark can be answered with "no"), the answer should be no. The fact seems to be true: such a powerful LLM cannot handle time series data well. Time series, that is, time series, as the name suggests, refers to a set of data point sequences arranged in the order of time. Time series analysis is critical in many areas, including disease spread prediction, retail analytics, healthcare, and finance. In the field of time series analysis, many researchers have recently been studying how to use large language models (LLM) to classify, predict, and detect anomalies in time series. These papers assume that language models that are good at handling sequential dependencies in text can also generalize to time series.

See all articles