Table of Contents
Method​
Experiment
Home Technology peripherals AI Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

Apr 14, 2023 pm 06:10 PM
Model natural learning

The technology of generating images from a single natural image is widely used and has therefore received more and more attention. This research aims to learn an unconditional generative model from a single natural image to generate different samples with similar visual content by capturing patch internal statistics. Once trained, the model can not only generate high-quality, resolution-independent images, but can also be easily adapted to a variety of applications, such as image editing, image harmonization, and conversion between images. ​

SinGAN can meet the above requirements. This method can construct multiple scales of natural images and train a series of GANs to learn the internal statistics of patches in a single image. The core idea of ​​SinGAN is to train multiple models at progressively increasing scales. However, the images generated by these methods can be unsatisfactory because they suffer from small-scale detail errors, resulting in obvious artifacts in the generated images (see Figure 2).

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

In this article, researchers from the University of Science and Technology of China, Microsoft Research Asia and other institutions proposed a new Framework - Single-image Diffusion (SinDiffusion, Single-image Diffusion), for learning from a single natural image, which is based on the Denoising Diffusion Probabilistic Model (DDPM). Although the diffusion model is a multiple-step generation process, it does not have the problem of cumulative errors. The reason is that the diffusion model has a systematic mathematical formula, and errors in intermediate steps can be regarded as interference and can be improved during the diffusion process. ​

Another core design of SinDiffusion is to limit the receptive field of the diffusion model. This study reviewed the network structure commonly used in previous diffusion models [7] and found that it has stronger performance and deeper structure. However, the receptive field of this network structure is large enough to cover the entire image, which causes the model to tend to rely on memory training images to generate images that are exactly the same as the training images. In order to encourage the model to learn patch statistics instead of memorizing the entire image, the research carefully designed the network structure and introduced a patch-wise denoising network. Compared with the previous diffusion structure, SinDiffusion reduces the number of downsampling and the number of ResBlocks in the original denoising network structure. In this way, SinDiffusion can learn from a single natural image and generate high-quality and diverse images (see Figure 2).

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

  • Paper address: https://arxiv.org/pdf/2211.12445.pdf
  • Project address: https://github.com/WeilunWang/SinDiffusion

The advantage of SinDiffusion is that it can be flexibly used in various scenarios (see Figure 1). It can be used in various applications without any retraining of the model. In SinGAN, downstream applications are mainly implemented by inputting conditions into pre-trained GANs at different scales. Therefore, the application of SinGAN is limited to those given spatially aligned conditions. In contrast, SinDiffusion can be used in a wider range of applications by designing the sampling procedure. SinDiffusion learns to predict the gradient of a data distribution through unconditional training. Assuming there is a scoring function describing the correlation between generated images and conditions (i.e., L−p distance or a pre-trained network such as CLIP), this study utilizes the gradient of the correlation score to guide the sampling process of SinDiffusion. In this way, SinDiffusion is able to generate images that fit both the data distribution and the given conditions.

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

The study conducted experiments on various natural images to demonstrate the advantages of the proposed framework. The experimental subjects include Landscapes and famous art. Both quantitative and qualitative results confirm that SinDiffusion can produce high-fidelity and diverse results, while downstream applications further demonstrate the utility and flexibility of SinDiffusion.

Method​

Different from the progressive growth design in previous studies, SinDiffusion uses a single denoising model at a single scale for training, preventing the accumulation of errors. In addition, this study found that the patch-level receptive field of the diffusion network plays an important role in capturing the internal patch distribution, and designed a new denoising network structure. Based on these two core designs, SinDiffusion generates high-quality and diverse images from a single natural image.

The rest of this section is organized as follows: first we review SinGAN and show the motivation of SinDiffusion, and then introduce the structural design of SinDiffusion.

First, let’s briefly review SinGAN. Figure 3(a) shows the generation process of SinGAN. In order to generate different images from a single image, a key design of SinGAN is to build an image pyramid and gradually increase the resolution of the generated images. ​

Figure 3(b) shows the new framework of SinDiffusion. Unlike SinGAN, SinDiffusion performs a multi-step generation process using a single denoising network at a single scale. Although SinDiffusion also uses the same multi-step generation process as SinGAN, the generated results are of high quality. This is because the diffusion model is based on the systematic derivation of mathematical equations, and errors generated by intermediate steps are repeatedly refined into noise during the diffusion process.

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

SinDiffusion

This article studied The relationship between generation diversity and the receptive field of the denoising network - Modifying the network structure of the denoising network can change the receptive field, and four network structures with different receptive fields but equivalent performance were designed to train these models on a single natural image. Figure 4 shows the results generated by the model under different receptive fields. It can be observed that the smaller the receptive field, the more diverse the generated results produced by SinDiffusion and vice versa. However, research has found that extremely small receptive field models cannot maintain the reasonable structure of the image. Therefore, a suitable receptive field is important and necessary to obtain reasonable patch statistics.

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

This research redesigns the commonly used diffusion model and introduces patch-wise for single image generation Denoising network. Figure 5 is an overview of the patch-wise denoising network in SinDiffusion and shows the main differences from previous denoising networks. First, the depth of the denoising network is reduced by reducing downsampling and upsampling operations, thereby greatly expanding the receptive field. At the same time, the deep attention layers originally used in the denoising network are naturally removed, making SinDiffusion a fully convolutional network suitable for generation at any resolution. Second, the receptive field of SinDiffusion is further limited by reducing the resblock of embedded time in each resolution. This method is used to obtain a patch-wise denoising network with appropriate receptive fields, achieving realistic and diverse results.

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

Experiment

The qualitative results of SinDiffusion’s randomly generated images are shown in Figure 6.

It can be found that at different resolutions, SinDiffusion can generate real images with similar patterns to the training images.

In addition, this article also studies SinDiffusion to generate high-resolution images from a single image. Figure 13 shows the training images and the generated results. The training image is a 486 × 741 resolution landscape image containing rich components such as clouds, mountains, grass, flowers, and a lake. To accommodate high-resolution image generation, SinDiffusion has been upgraded to an enhanced version with larger receptive fields and network capabilities. The enhanced version of SinDiffusion generates a high-resolution long scrolling image with a resolution of 486×2048. The generated effect keeps the internal layout of the training image unchanged and summarizes new content, as shown in Figure 13.

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

Comparison with previous methods

Table 1 shows the difference between SinDiffusion and The quantitative results produced are compared with several challenging methods (i.e., SinGAN, ExSinGAN, ConSinGAN and GPNN). Compared with previous GAN-based methods, SinDiffusion achieved SOTA performance after gradual improvements. It is worth mentioning that the research method in this article has greatly improved the diversity of generated images. On the average of 50 models trained on the Places50 data set, this method surpassed the current most challenging method with a score of 0.082 LPIPS. .


Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

In addition to the quantitative results, Figure 8 also shows the qualitative results on the Places50 dataset.

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

Figure 15 shows the text-guided image generation results of SinDiffusion and previous methods.

Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA

Please see the original paper for more information.

The above is the detailed content of Learning a diffusion model from a single natural image is better than GAN, SinDiffusion achieves new SOTA. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Time Series Forecasting NLP Large Model New Work: Automatically Generate Implicit Prompts for Time Series Forecasting Time Series Forecasting NLP Large Model New Work: Automatically Generate Implicit Prompts for Time Series Forecasting Mar 18, 2024 am 09:20 AM

Today I would like to share a recent research work from the University of Connecticut that proposes a method to align time series data with large natural language processing (NLP) models on the latent space to improve the performance of time series forecasting. The key to this method is to use latent spatial hints (prompts) to enhance the accuracy of time series predictions. Paper title: S2IP-LLM: SemanticSpaceInformedPromptLearningwithLLMforTimeSeriesForecasting Download address: https://arxiv.org/pdf/2403.05798v1.pdf 1. Large problem background model

FisheyeDetNet: the first target detection algorithm based on fisheye camera FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

See all articles