Table of Contents
Usage scenarios
Method
Application scenarios: forgetting harmful content, etc.
Conclusion
Home Technology peripherals AI 2% of the computing power of RLHF is used to eliminate harmful output of LLM, and Byte releases forgetful learning technology

2% of the computing power of RLHF is used to eliminate harmful output of LLM, and Byte releases forgetful learning technology

Dec 14, 2023 pm 11:55 PM
Model Computing power

With the development of large language models (LLM), practitioners face more challenges. How to avoid harmful replies from LLM? How to quickly delete copyright-protected content in training data? How to reduce LLM hallucinations (false facts)? How to quickly iterate LLM after data policy changes? These issues are critical to the safe and trustworthy deployment of LLM under the general trend of increasingly mature legal and ethical compliance requirements for artificial intelligence.

The current mainstream solution in the industry is to fine-tune the comparison data (positive samples and negative samples) by using reinforcement learning to align LLM (alignment) to ensure that the output of LLM is consistent with human Expectations and values. However, this alignment process is often limited by data collection and computing resources

ByteDance proposed a method for LLM to perform forgetting learning for alignment. This article studies how to perform "forgetting" operations on LLM, that is, forgetting harmful behaviors or machine unlearning (Machine Unlearning). The author shows the obvious effect of forgetting learning on three LLM alignment scenarios: (1) removing harmful output; (2) removing infringement protection content; (3) eliminating the big language LLM illusion

Forgetting learning has three advantages: (1) Only negative samples (harmful samples) are needed, and the negative samples are much simpler to collect than the positive samples (high-quality manual handwriting output) required by RLHF (such as red team testing or user report); (2) low computational cost; (3) forgetting learning is particularly effective if it is known which training samples lead to harmful behavior of LLM.

The author's argument is that for practitioners with limited resources, they should prioritize stopping producing harmful outputs rather than trying to pursue overly idealized outputs and forgetting that learning is a a convenient method. Despite having only negative samples, research shows that forget learning can still achieve better alignment performance than reinforcement learning and high-temperature high-frequency algorithms using only 2% of the computation time

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

  • Paper address: https://arxiv.org/abs/2310.10683
  • Code address: https: //github.com/kevinyaobytedance/llm_unlearn

Usage scenarios

With limited resources, we can Take this approach to maximize your advantages. When we don’t have the budget to hire people to write high-quality samples or the computing resources are insufficient, we should prioritize stopping LLM from producing harmful output rather than trying to make it produce beneficial output

harmful output caused by The damage cannot be compensated by beneficial output. If a user asks an LLM 100 questions and the answers he gets are harmful, he will lose trust, no matter how many helpful answers the LLM provides later. The expected output of harmful problems may be spaces, special characters, meaningless strings, etc. In short, it must be harmless text

shows three successful cases of LLM forgetting learning: (1) Stop generating harmful replies (please rewrite the content into Chinese, the original sentence does not need to appear); this is similar to the RLHF scenario, but the difference is that the goal of this method is to generate harmless replies, not helpful replies. This is the best that can be expected when there are only negative samples. (2) After training with infringing data, LLM successfully deleted the data and could not retrain LLM due to cost factors; (3) LLM successfully forgot the "illusion"

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Please rewrite the content into Chinese, the original sentence does not need to appear

Method

In the fine-tuning step t, The update of LLM is as follows:

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The first loss is gradient descent (gradient descent), the purpose is to forget harmful samples:

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术 is a harmful prompt (prompt), RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术 is the corresponding harmful reply. The overall loss reversely increases the loss of harmful samples, which makes LLM "forget" harmful samples.

The second loss is for random mismatches, which requires LLM to predict irrelevant replies in the presence of harmful cues. This is similar to label smoothing [2] in classification. The purpose is to make LLM better forget harmful output on harmful prompts. At the same time, experiments have proven that this method can improve the output performance of LLM under normal circumstances

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The third loss is to maintain performance on normal tasks:

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Similar to RLHF, calculating KL divergence on pre-trained LLM can better maintain LLM performance.

Additionally, all gradient ascent and descent is done only on the output (y) part, not on the tip-output pair (x, y) like RLHF.

Application scenarios: forgetting harmful content, etc.

This article uses PKU-SafeRLHF data as forgotten data, TruthfulQA as normal data, the content of Figure 2 The need for rewriting shows the harmful rate of LLM output on unlearned harmful cues after forgetting learning. The methods used in this article are GA (Gradient Ascent and GA Mismatch: Gradient Ascent Random Mismatch). The harmful rate after forgetting learning is close to zero.

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The content of the second picture needs to be rewritten

The third picture shows harmful prompts (not Forgotten) output, which has not been seen before. Even for harmful cues that have not been forgotten, the harmful rate of LLM is close to zero, which proves that LLM forgets not only specific samples, but generalizes to content containing harmful concepts

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Figure 3

The performance of LLM on normal samples remains similar to that before forgetting, and it also has the following characteristics

Table 1 shows the generated samples. It can be seen that under the harmful prompt, the samples generated by LLM are meaningless strings, that is, harmless output.

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Table 1

In other scenarios, such as forgetting infringing content and forgetting hallucinations, this method The original application text is described in detail

RLHF comparison

What needs to be rewritten Yes: The second table shows the comparison between this method and RLHF. RLHF uses positive examples, while the forgetting learning method only uses negative examples, so the method is at a disadvantage at the beginning. But even so, forgetting learning can still achieve alignment performance similar to RLHF

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

The content that needs to be rewritten is: the second table

What needs to be rewritten: The fourth picture shows the comparison of calculation times. This method only requires 2% of the calculation time of RLHF.

RLHF 2%的算力应用于消除LLM有害输出,字节发布遗忘学习技术

Content that needs to be rewritten: The fourth picture

Even with only negative samples, the method using forgetting learning can achieve a harmless rate comparable to RLHF and only use 2% of the computing power. Therefore, if the goal is to stop outputting harmful content, forgetting learning is more efficient than RLHF

Conclusion

This study is the first of its kind Exploring forgetting learning on LLM. The findings show that learning to forget is a promising approach to alignment, especially when practitioners are under-resourced. The paper shows three situations: forgetting learning can successfully delete harmful replies, delete infringing content and eliminate illusions. Research shows that even with only negative samples, forgetting learning can still achieve similar alignment effects to RLHF using only 2% of the calculation time of RLHF

The above is the detailed content of 2% of the computing power of RLHF is used to eliminate harmful output of LLM, and Byte releases forgetful learning technology. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

FisheyeDetNet: the first target detection algorithm based on fisheye camera FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

DualBEV: significantly surpassing BEVFormer and BEVDet4D, open the book! DualBEV: significantly surpassing BEVFormer and BEVDet4D, open the book! Mar 21, 2024 pm 05:21 PM

This paper explores the problem of accurately detecting objects from different viewing angles (such as perspective and bird's-eye view) in autonomous driving, especially how to effectively transform features from perspective (PV) to bird's-eye view (BEV) space. Transformation is implemented via the Visual Transformation (VT) module. Existing methods are broadly divided into two strategies: 2D to 3D and 3D to 2D conversion. 2D-to-3D methods improve dense 2D features by predicting depth probabilities, but the inherent uncertainty of depth predictions, especially in distant regions, may introduce inaccuracies. While 3D to 2D methods usually use 3D queries to sample 2D features and learn the attention weights of the correspondence between 3D and 2D features through a Transformer, which increases the computational and deployment time.

See all articles