The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
##This article introduces
Hong Kong University of Science and Technology (Guangzhou) 's article "Parameter-Efficient Fine-Tuning with Discrete Fourier Transform" about LLM PEFT Fine-tuning was accepted by ICML 2024 and the code has been open source. ##Paper address: https://arxiv.org/abs/2405.03003
-
Project address: https://github.com/Chaos96/fourierft
Large-scale base models have achieved remarkable achievements in the fields of natural language processing (NLP) and computer vision (CV). Fine-tuning large-scale base models to make them more suitable for special downstream tasks has become a popular research topic. However, as models become larger and larger and downstream tasks become more and more diverse, the computing and storage consumption caused by fine-tuning the entire model is no longer acceptable. LoRA adopts a low-rank fitting fine-tuning increment scheme and successfully reduces a large amount of such consumption, but the size of each adapter is still not negligible. This inspires the core question of this article:
Compared with LoRA, how to further significantly reduce the trainable parameters? In addition, an interesting additional question is whether a high-rank incremental matrix can be obtained with fewer parameters.
Method
Fourier basis is widely used in various data compression applications, e.g. Compression of one-dimensional vector signals and two-dimensional images. In these applications, dense spatial domain signals are converted into sparse frequency domain signals through Fourier transform. Based on this principle, the author speculates that the increment of model weight can also be regarded as a spatial domain signal, and its corresponding frequency domain signal can be realized through sparse representation. Based on this assumption, the authors propose a new method for learning incremental weight signals in the frequency domain. Specifically, this method represents spatial domain weight increments through sparse frequency domain signals at random locations. When loading the pre-trained model, n points are first randomly selected as valid frequency domain signals, and then these signals are spliced into a one-dimensional vector. During the forward propagation process, this one-dimensional vector is used to restore the spatial matrix through Fourier transform; during the back propagation process, due to the differentiability of the Fourier transform, this learnable vector can be updated directly. This method not only effectively reduces the number of parameters required for model fine-tuning, but also ensures fine-tuning performance. In this way, the authors not only achieved efficient fine-tuning of large-scale basic models, but also demonstrated the potential application value of Fourier transform in the field of machine learning. Thanks to the high information content of the Fourier transform basis, only a small n value is needed to achieve performance comparable to or even exceeding LoRA. In general, the trainable parameters of Fourier fine-tuning are only one-thousandth to one-tenth of those of LoRA.
Experiment
##1. Natural language understandingThe authors evaluated the Fourier fine-tuning method on the GLUE benchmark for natural language understanding. Baseline comparison methods include Full Finetuning (FF), Bitfit, Adapter Tuning, LoRA, DyLoRA and AdaLoRA. The following table shows the performance of various methods on various GLUE tasks and the amount of training parameters required. The results show that Fourier fine-tuning reaches or even exceeds the performance of other fine-tuning methods with the smallest number of parameters.
2. Fine-tuning of natural language instructionsNatural language generation of large models is currently an important application field of model fine-tuning. The authors evaluate the performance of Fourier fine-tuning on the LLaMA family of models, the MT-Bench task, and the Vicuna task. The results show that Fourier fine-tuning achieves similar effects to LoRA with a very low amount of training parameters, further verifying the versatility and effectiveness of the Fourier fine-tuning method.
The author tested Fourier on Vision Transformer Fine-tuned performance covering 8 common image classification datasets. Experimental results show that although the improvement in compression rate of Fourier fine-tuning compared to LoRA in image classification tasks is not more significant than that in natural language tasks, it still surpasses the effect of LoRA with a much smaller number of parameters than LoRA. This further demonstrates the effectiveness and advantages of Fourier trimming in different application areas.
4. Breaking through low rankOn the RTE data set of the GLUE benchmark, FourierFT can achieve significantly higher increments of rank than LoRA (usually 4 or 8).
5.GPU resource consumption During the fine-tuning process, FourierFT can achieve better performance than LoRA Less GPU consumption. The figure below shows the peak memory consumption on the RoBERTa-Large model using a single 4090 graphics card.
The author introduced an efficient fine-tuning method called Fourier fine-tuning , by utilizing Fourier transform to reduce the number of trainable parameters when fine-tuning a large base model. This method significantly reduces storage and computing requirements by learning a small number of Fourier spectrum coefficients to represent weight changes. Experimental results show that Fourier fine-tuning performs well on tasks such as natural language understanding, natural language generation, instruction tuning, and image classification. Compared with existing low-rank adaptation methods (such as LoRA), Fourier fine-tuning maintains or exceeds the performance of LoRA. At the same time, the required trainable parameters are greatly reduced. The above is the detailed content of ICML 2024 | Breaking away from the LoRA architecture, training parameters are greatly reduced, and a new type of Fourier fine-tuning is coming. For more information, please follow other related articles on the PHP Chinese website!