Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set-AI-php.cn

Table of Contents

Home

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

PHPz

Apr 16, 2023 pm 02:10 PM

image Model

Last year, image generation models became popular. After a mass art carnival, copyright issues followed.

The training of deep learning models such as DALL-E 2, Imagen and Stable Diffusion are all trained on hundreds of millions of data. There is no way to get rid of the influence of the training set. , but are some of the generated images completely derived from the training set? If the generated image is very similar to the original image, who owns the copyright?

Recently, researchers from Google, Deepmind, ETH Zurich and many other well-known universities and companies jointly published a paper. They found that diffusion model It is indeed possible to remember the samples in the training set and reproduce them during the generation process.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

Paper link: https://arxiv.org/abs/2301.13188

In this work, the researchers show how a diffusion model can remember a single image in its training data and reproduce it as it is generated.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

The article proposes a generate-and-filter(generate-and-filter) pipeline, starting from The state-of-the-art model extracts more than a thousand training examples, covering photos of people, trademarks, company logos, and more. We also trained hundreds of diffusion models in different environments to analyze how different modeling and data decisions affect privacy.

Overall, the experimental results show that the diffusion model provides much worse privacy protection for the training set than previous generative models (such as GANs).

I remember, but not much

The denoising diffusion model is a new generative neural network that has emerged recently. It uses an iterative denoising process. Generating images from the training distribution is better than the previously commonly used GAN or VAE models, and it is easier to expand the model and control image generation, so it has quickly become a mainstream method for generating various high-resolution images.

Especially after OpenAI released DALL-E 2, the diffusion model quickly became popular in the entire field of AI generation.

The appeal of generative diffusion models stems from their ability to synthesize new images that are ostensibly different from anything in the training set. In fact, past large-scale training efforts "have not "Discover the problem of overfitting", and researchers in the privacy sensitive domain even proposed that the diffusion model can "protect the privacy of real images" by synthesizing images.

However, these works all rely on an assumption: That is, the diffusion model will not remember and regenerate training data, otherwise it will violate the privacy guarantee and cause many concerns. Problems of model generalization and digital forgery.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

But, is this the truth?

To determine whether the generated image comes from the training set, you first need to define what is "memorization".

Previous related work mainly focused on text language models. If the model can recover a verbatim recorded sequence verbatim from the training set, then this sequence is called "extraction" and "memory"; but because this work is based on high-resolution images, word-for-word matching definitions of memory are not suitable.

The following is a memory based on image similarity measures defined by the researchers.

If the distance between a generated image x and multiple samples in the training set is less than a given threshold, then the sample is considered to be from the training set What is obtained through concentration is Eidetic Memorization.

Then, the article designs a two-stage data extraction attack Method:

1 . Generate a large number of images

The first step is simple but computationally expensive: generate images in a black-box manner using the selected prompt as input .

The researchers generated 500 candidate images for each text prompt to increase the odds of discovering memories.

2. Carry out Membership Inference

generate those suspected to be based on the training set memory images are marked.

The member inference attack strategy designed by the researchers is based on the following idea: for two different random initial seeds, the probability of similarity between the two images generated by the diffusion model will be very high, and it is possible that The distance metric is considered to be generated from memory.

Extraction results

To evaluate the effectiveness of the attack, the researchers selected the 350,000 most repeated examples from the training data set and generated 500 images for each prompt Candidate images (175 million images generated in total).

First sort all these generated images by the average distance between images in the clique to identify those that were likely generated by memorizing the training data.

Then these generated images were compared with the training images, and each image was marked as "extracted" and "not extracted". Finally, 94 images were found that were suspected to be extracted from the training set. image.

Through visual analysis, the top 1000 images were manually labeled as "memorized" or "not memorized", and it was found that 13 images were generated by copying training samples.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

From the P-R curve, this attack method is very accurate: in 175 million generated images , can recognize 50 memorized images with a false positive rate of 0; and all images generated based on memory can be extracted with an accuracy higher than 50%

To better understand how and why memory occurs, the researchers also trained hundreds of smaller diffusion models on CIFAR10 to analyze the privacy impact of model accuracy, hyperparameters, augmentation, and deduplication.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

##Diffusion vs GAN

Unlike diffusion models, GANs are not explicitly trained to Memorize and reconstruct its training data set.

GANs consist of two competing neural networks: a generator and a discriminator. The generator also receives random noise as input, but unlike the diffusion model, it must convert this noise into a valid image in one forward pass.

In the process of training GAN, the discriminator needs to predict whether the image comes from the generator, and the generator needs to improve itself to deceive the discriminator.

Therefore, the difference between the two is that the generator of GAN is only trained using indirect information about the training data (that is, using the gradient from the discriminator) and does not directly receive training data as input.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

1 million unconditionally generated training images extracted from different pre-training generation models, and then sorted by FID Place the GAN model (lower is better) at the top and the diffusion model at the bottom.

The results show that diffusion models remember more than GAN models, and better generative models (lower FID) tend to remember more data, that is, Diffusion models are the least private form of image models, leaking more than twice as much training data as GANs.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

And from the above results, we can also find that the existing privacy enhancement technology does not provide an acceptable privacy-performance trade-off. To improve the quality of generation, you need to remember more data in the training set.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

Overall, this paper highlights the tension between increasingly powerful generative models and data privacy, and raises questions about how diffusion models work and how they can be deployed responsibly.

Copyright Issue

Technically speaking, reconstruction is the advantage of the diffusion model; but from a copyright perspective, reconstruction is its weakness.

Artists have variously argued over their copyright issues due to the excessive similarity between the images generated by the diffusion model and the training data.

For example, AI is prohibited from using its own works for training, and a large number of watermarks are added to published works; and Stable Diffusion has also announced that it plans to only use training that contains authorized content in the next step. dataset, and provides an artist exit mechanism.

The same problem is faced in the NLP field. Some netizens said that millions of words of text have been published since 1993, and all AI including ChatGPT-3 are "being used". It is unethical to use AI-based generative models trained on stolen content.

Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set

Although a lot of articles are plagiarized in the world, for ordinary people, plagiarism is just a dispensable thing Shortcuts; but for the creators, the plagiarized content is their hard work.

Will the diffusion model still have advantages in the future?

The above is the detailed content of Not as good as GAN! Google, DeepMind and others issued articles: Diffusion models are 'copied” directly from the training set. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7767

Java Tutorial

1644

CakePHP Tutorial

1399

Laravel Tutorial

1293

PHP Tutorial

1234

Related knowledge

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

DualBEV: significantly surpassing BEVFormer and BEVDet4D, open the book! Mar 21, 2024 pm 05:21 PM

This paper explores the problem of accurately detecting objects from different viewing angles (such as perspective and bird's-eye view) in autonomous driving, especially how to effectively transform features from perspective (PV) to bird's-eye view (BEV) space. Transformation is implemented via the Visual Transformation (VT) module. Existing methods are broadly divided into two strategies: 2D to 3D and 3D to 2D conversion. 2D-to-3D methods improve dense 2D features by predicting depth probabilities, but the inherent uncertainty of depth predictions, especially in distant regions, may introduce inaccuracies. While 3D to 2D methods usually use 3D queries to sample 2D features and learn the attention weights of the correspondence between 3D and 2D features through a Transformer, which increases the computational and deployment time.

See all articles