The diffusion model that has become popular in half the sky will be eliminated?
Currently, generative AI models, such as GAN, diffusion model or consistency model, generate images by mapping inputs to outputs corresponding to the target data distribution. The content that needs to be rewritten is :
Normally, this kind of model needs to learn a lot of real pictures, and then it can try to ensure the real features of the generated pictures. The content that needs to be rewritten is:
Recently, researchers from UC Berkeley and Google proposed a new generation model-Impotent Generative Network (IGN). The content that needs to be rewritten is:
Picture
Paper address: https://arxiv.org/abs/2311.01462
IGNs can be selected from a variety of Inputs, such as random noise, simple graphics, etc., generate realistic images in a single step without the need for multi-step iterations. What needs to be rewritten is:
This model aims to be A "global projector" can map any input data to the target data distribution. The content that needs to be rewritten is:
In short, the general image generation model must be What needs to be rewritten is this:
Interestingly, a highly effective scene in "Seinfeld" actually became the author's source of inspiration. What needs to be rewritten is:
Picture
This scene well summarizes the concept of "idempotent operator", which refers to During the operation, if the same input is repeatedly operated, the result will always be the same. The content that needs to be rewritten is:
, that is,
Picture
The content that needs to be rewritten is:
As Jerry Seinfeld humorously pointed out, some real-life behaviors can also be considered The idempotent content that needs to be rewritten is:
IGN has two important differences with GAN and diffusion model:
- Different from GAN, IGN does not require separate generators and discriminators. It is a "self-confrontation" model. The content that needs to be rewritten to complete generation and discrimination at the same time is:
- Unlike diffusion models that perform incremental steps, IGN attempts to map inputs to data distributions in a single step. What needs to be rewritten is:
What is the origin of IGN (idempotent generative model)?
It is trained to be from the source distribution Given the target distribution of the input samples , the generated samples need to be rewritten The content is:
Given the example data set, each example is taken from The content is: Then, the researchers trained the model to map to . The content that needs to be rewritten is:
Assume that the distributions and are located in the same space, i.e. their instances have the same dimensions. What needs to be rewritten is: This allows Applies to two types of instances and The content that needs to be rewritten is:
The figure shows the basic idea behind IGN: the real example (x) is invariant to the model fThe content that needs to be rewritten is: other inputs (z) are mapped to f By optimizing , the content that needs to be rewritten on the instance stream mapped to itself is:
Picture
IGN training routine PyTorch code example that needs to be rewritten is:
##Picture
Experimental resultsAfter getting IGN, what is the effect?
The author admits that at this stage, the generated results of IGN cannot compete with the most advanced models. The content that needs to be rewritten is:
At In the experiment, a smaller model and a lower-resolution data set were used, and the main focus in the exploration was on the simplified method. The content that needs to be rewritten is:
Of course, the basic generation Modeling technologies, such as GAN and diffusion models, also took a long time to achieve mature and large-scale performance. The content that needs to be rewritten is:
The researchers evaluated IGN on MNIST (greyscale handwritten digits dataset) and CelebA (face image dataset), using image resolutions of 28×28 and 64×64 respectively. The content is:
The author uses a simple autoencoder architecture, where the encoder is a simple five-layer discriminator backbone from DCGAN, and the decoder is the generator. The content that needs to be rewritten is : The training and network hyperparameters are shown in Table 1. The content that needs to be rewritten is:
Picture
Figure 4 shows the qualitative results for the two data sets after applying the model once and twice consecutively. What needs to be rewritten is:
As shown, applying IGN once (f (z)) will produce coherent generation results. What needs to be rewritten is: However, artifacts may occur, such as holes in MNIST digits, or the top of the head in facial images. The distorted pixels of hair and hair need to be rewritten:
Applying f (f (f (z))) again can correct these problems, fill holes, or reduce facial noise patches The total changes around what needs to be rewritten are:
Picture
Figure 7 shows the additional results and applying f three times As a result, the content that needs to be rewritten is:
Picture
##Comparing and shows that when the image is close to the learned manifold When , applying f again results in minimal changes, as the image is considered distributed. What needs to be rewritten is:
The author proves by performing operations that IGN has a consistent latent space, similar to that shown for GAN. Figure 6 shows that the latent space algorithm needs to be rewritten as:
Picture
The author also verified that by converting data from various distributions The image is input into the model to generate its equivalent "natural image" to verify the potential of IGN's "global mapping". The content that needs to be rewritten is:
The researchers passed the noisy image x n denoising, colorizing the grayscale image, and converting the sketch to the real image in Figure 5 to prove this point needs to be rewritten is:
Original image x, these inverse tasks are ill-posed. What needs to be rewritten is: IGN can create a natural mapping that conforms to the original image structure. What needs to be rewritten is:
As shown, applying f continuously can improve image quality (for example, it removes dark and smoke artifacts in projected sketches) What needs to be rewritten is:
Pictures
Google Next?It can be seen from the above results that IGN is more effective in inference and can generate results in a single step after training. The content that needs to be rewritten is:
They can also output more consistent results, which may be extended to more applications, such as medical image repair. The content that needs to be rewritten is:
The author of the paper stated:
We view this work as a first step toward models that learn to map arbitrary inputs to target distributions, a new paradigm in generative modeling that needs to be rewritten. The content is:
Next, the research team plans to expand the scale of IGN with more data, hoping to tap the full potential of new generative AI models that need to be rewritten. The content is:
The latest research code will be published on GitHub in the future. The content that needs to be rewritten is:
References:
https://www.php.cn/link/2bd388f731f26312bfc0fe30da009595
https://www .php.cn/link/e1e4e65fddf79af60aab04457a6565a6
The above is the detailed content of UC Berkeley Google innovates LLM, implements terminal diffusion model and uses it for IGN to generate realistic images in a single step, and American TV series become a source of inspiration. For more information, please follow other related articles on the PHP Chinese website!