


ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality
Author | Pengfei Zheng
Unit | USTC, HKBU TMLR Group
In recent years, the rapid development of generative AI has injected strong impetus into eye-catching fields such as text-to-image generation and video generation. The core of these techniques lies in the application of diffusion models. The diffusion model first gradually changes the picture into Gaussian noise by defining a forward process that continuously adds noise, and then gradually denoises the Gaussian noise through a reverse process and turns it into a clear picture to obtain real samples. The diffusion ordinary differential model is used to interpolate the values of the generated images, which has great application potential in generating videos and some advertising creatives. However, we noticed that when this method is applied to natural images, the interpolated image effects are often unsatisfactory.
In general, the diffusion model samples Gaussian noise and then gradually denoises it to generate high-quality images. The low quality of the interpolated image means that its underlying variables no longer follow the Gaussian distribution we would expect. To improve the quality of the interpolated picture, we need to ensure that the underlying variables are more closely sampled from a Gaussian distribution. Directly scaling and offsetting the latent variables will severely damage the resulting image, and in order to preserve the information of the original image, we cannot modify the latent variables too much. Therefore, it becomes a difficult problem to improve the quality of interpolated images without destroying the underlying variables as much as possible.
We first change the noise level of the latent variables to analyze what kind of latent variables can be restored into high-quality pictures by the diffusion model, and combine the SDEdit method to introduce Gaussian noise to improve the quality of the interpolated pictures, and the Gaussian noise Introduction brings additional information. Furthermore we analyze potential orthogonality in high-dimensional spaces, which provides the basis for our approach. We combine the spherical linear interpolation method and the method of directly introducing noise to propose a new interpolation method: constraining potential extreme values, combining with tiny Gaussian noise to make it closer to the expected distribution, and introducing the original image to alleviate The problem of information loss. Using this interpolation method, we can significantly improve the interpolation results of natural images while retaining the original image information.
Next, I will briefly share our research results with you.
Paper title: NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation
Paper link:https:/ /www.php.cn/link/68310dc294a1c38c7ba636380151daca
Code link: https://www.php.cn/link/fc9e5c39356354a60d33ca59499913ca
Introduction
Figure 1: Application of spherical linear interpolation method on face images
Diffusion model is the most commonly used image interpolation method It is the spherical linear interpolation method [1,2]:
We apply this method to natural pictures. It can be observed from Figure 2 that when applying spherical linear interpolation method on natural pictures, the interpolation effect drops significantly.
Figure 2: Comparison of interpolation effects between natural pictures and generated pictures
Analysis
Figure 3: Effect of Gaussian noise denoising with different noise levels
We first study the impact of noise level on generated images. It is observed that only when the level of Gaussian noise matches the level of denoising (middle image), a higher quality image is obtained. If the noise level is lower than the denoising level (right image), or higher than the denoising level (left image), the quality of the generated image will be reduced. We use Theorem 1 to explain this phenomenon:
Theorem 1 explains the distribution characteristics of standard Gaussian noise in high-dimensional space: they are mainly concentrated on a hypersphere. On the inside of this hypersphere, although the probability density of points is relatively high, its overall contribution is not significant due to the small volume it occupies; while on the outside of the hypersphere, although the volume of points is larger, due to the probability Density decays rapidly with distance, so the contribution from outside points is also negligible. Therefore, when training a diffusion model, the latent variables we mainly observe are concentrated on the hypersphere, and the latent variables inside and outside the hypersphere are difficult to effectively denoise for these reasons.
Figure 4: Reasons why natural picture interpolation fails
Natural pictures often have complex features that the diffusion model has not seen during training, which makes the diffusion The model encounters difficulty when trying to convert natural images into standard Gaussian noise. Specifically, the latent variables of these images may contain Gaussian noise above or below the range of the model's denoising capabilities. However, the ability of the diffusion model is mainly limited to restoring Gaussian noise on the hypersphere described in Theorem 1. For noise outside this range, the model often cannot handle it effectively. Therefore, when performing image interpolation, lower quality interpolated images are often produced.
Introducing noise
Figure 5: Directly introducing noise interpolation
In order to improve the quality of the picture and make the latent variables closer to the hypersphere, We adopted a method combined with SDEdit [3]. Specifically, we directly add standard Gaussian noise to the image, then perform interpolation, and finally perform denoising. It can be clearly seen from Figure 5 that this method significantly improves the quality of interpolated images. However, it should be noted that this approach also introduces some additional information as shown in the figure.
Method
Figure 6: Overall design of NoiseDiffusion
In order to improve picture quality and reduce information loss as much as possible, we innovatively combine In addition to the spherical linear interpolation method and the interpolation method that directly introduces noise, a new NoiseDiffusion method is proposed. As shown in Figure 6, the overall design of NoiseDiffusion not only considers information retention during the interpolation process, but also improves picture quality by introducing noise, achieving an effective balance between the two. Next, we will elaborate on the design ideas of NoiseDiffusion.
Design 1:
Figure 7: Constraining the extreme values of potential variables
According to statistics, beyond a certain range Noise components can be considered outliers. Combined with Figure 3, we found that Gaussian noise higher than the denoising level will produce obvious noise points, which are very similar to the abnormal color patches on the interpolation results of natural pictures. Therefore, we have reason to believe that the extreme values of the latent variables are responsible for the problem. The production of these abnormal color patches. Based on these analyses, we impose constraints on the extreme values of the latent variables to control the impact of these abnormal noises. As can be seen from Figure 7, by constraining the extreme values of the latent variables, we have greatly improved the quality of the image.
Design 2:
Figure 8: Introducing original image information
We may be careless when imposing constraints on potential variables Affected some normal components, resulting in the loss of information. In order to compensate for this potential information loss, we introduce the original image information as a supplement. As shown in Figure 8, after introducing the original image information, the quality of the interpolated image has been significantly improved. This shows that the original image information plays an important role in compensating for information loss. By combining the constraints of latent variables and the supplement of original image information, we can reduce information loss while ensuring image quality, and achieve a more accurate and natural interpolation effect.
Design 3:
Spherical linear interpolation is an interpolation method that relies on calculating the angle between two latent variables. However, in practical applications, we observe that these latent variables often exhibit a nearly orthogonal state. In order to explain this phenomenon, we introduce Theorem 2 as theoretical support.
Figure 9: Introducing Gaussian noise of different sizes
Figure 10: Combined with Design 1 to reduce the amount of introduced Gaussian noise
It can be seen from Figure 9 It can be seen that as we gradually increase the amount of Gaussian noise introduced, the quality of the interpolated images is significantly improved. However, this improvement does not come without a cost, as as the amount of noise increases, so does the introduction of additional information. In the actual interpolation process, in order to minimize the introduction of additional information while meeting quality requirements, we combined the previously mentioned strategies to effectively reduce the amount of Gaussian noise that needs to be introduced (Figure 10), thereby better retaining Information about the original image.
Experiment
Figure 11: Comparison with spherical linear interpolation method
We compare the proposed method with spherical linear interpolation method The results are compared (shown in Figure 11). Judging from the interpolation results, our method significantly improves the quality of interpolated images while losing almost no information. This fully demonstrates the superior performance of our method in maintaining information integrity and improving image quality.
We also conducted experiments on Stable Diffusion [4]. Due to the highly unstructured latent space of Stable Diffusion, it is difficult to obtain smooth interpolation (Figure 12). Therefore, we consider interpolation () at a smaller time step, which can retain more features of the original image and make the interpolation result smoother, but it results in a reduction in image quality (Figure 13). To solve this problem, we applied our method NoiseDiffusion to correct the latent variables (Figure 14). It can be seen from the experimental results that our method significantly improves the quality of images while changing less information.
Figure 12: Using spherical linear interpolation when
Figure 13: Using spherical linear interpolation when
Figure 14: Using NoiseDiffusion interpolation when
Reference
[1] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.
[2] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models . In ICLR, 2021.
[3] Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon.
Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
[4]Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High resolution image synthesis with latent diffusion models. In CVPR, 2022.
[5] Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. Gan
inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
Introduction to the research group
The Trustworthy Machine Learning and Reasoning Research Group (TMLR Group) of Hong Kong Baptist University consists of a number of young professors and postdoctoral researchers , doctoral students, visiting doctoral students and research assistants, the research team is affiliated with the Department of Computer Science, School of Science. The research group specializes in trustworthy representation learning, trustworthy learning based on causal reasoning, trustworthy basic models and other related algorithms, theory and system design, as well as applications in natural sciences. The specific research directions and related results can be found on the group's Github (https ://github.com/tmlr-group). The research team is funded by government research funds and industrial research funds, such as the Hong Kong Research Grants Council Outstanding Young Scholars Program, National Natural Science Foundation of China general projects and youth projects, as well as scientific research funds from Microsoft, NVIDIA, Baidu, Alibaba, Tencent and other companies. Young professors and senior researchers work hand in hand, and GPU computing resources are sufficient. It has long-term recruitment of many postdoctoral researchers, doctoral students, research assistants and research interns. In addition, the group also welcomes applications from self-funded visiting postdoctoral fellows, doctoral students and research assistants for at least 3-6 months, and remote access is supported. Interested students please send your resume and preliminary research plan to the email address (bhanml@comp.hkbu.edu.hk).
The above is the detailed content of ICLR 2024 Spotlight | NoiseDiffusion: Correct diffusion model noise and improve interpolation image quality. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Editor | ScienceAI Based on limited clinical data, hundreds of medical algorithms have been approved. Scientists are debating who should test the tools and how best to do so. Devin Singh witnessed a pediatric patient in the emergency room suffer cardiac arrest while waiting for treatment for a long time, which prompted him to explore the application of AI to shorten wait times. Using triage data from SickKids emergency rooms, Singh and colleagues built a series of AI models that provide potential diagnoses and recommend tests. One study showed that these models can speed up doctor visits by 22.3%, speeding up the processing of results by nearly 3 hours per patient requiring a medical test. However, the success of artificial intelligence algorithms in research only verifies this

Editor |KX To this day, the structural detail and precision determined by crystallography, from simple metals to large membrane proteins, are unmatched by any other method. However, the biggest challenge, the so-called phase problem, remains retrieving phase information from experimentally determined amplitudes. Researchers at the University of Copenhagen in Denmark have developed a deep learning method called PhAI to solve crystal phase problems. A deep learning neural network trained using millions of artificial crystal structures and their corresponding synthetic diffraction data can generate accurate electron density maps. The study shows that this deep learning-based ab initio structural solution method can solve the phase problem at a resolution of only 2 Angstroms, which is equivalent to only 10% to 20% of the data available at atomic resolution, while traditional ab initio Calculation

There are many ways to install DeepSeek, including: compile from source (for experienced developers) using precompiled packages (for Windows users) using Docker containers (for most convenient, no need to worry about compatibility) No matter which method you choose, Please read the official documents carefully and prepare them fully to avoid unnecessary trouble.

Peking University and the EVLO innovation team jointly proposed DriveWorld, a four-dimensional space-time pre-training algorithm for autonomous driving. This method uses a world model for pre-training, designs a memory state space model for four-dimensional spatio-temporal modeling, and reduces the random uncertainty and knowledge uncertainty faced by autonomous driving by predicting the occupation grid of the scene. This paper has been accepted by CVPR2024. Paper title: DriveWorld: 4DPre-trainedSceneUnderstandingviaWorldModelsforAutonomousDriving Paper link: https://arxiv.org/abs/2405.04390 1. Motion

In 2023, almost every field of AI is evolving at an unprecedented speed. At the same time, AI is constantly pushing the technological boundaries of key tracks such as embodied intelligence and autonomous driving. Under the multi-modal trend, will the situation of Transformer as the mainstream architecture of AI large models be shaken? Why has exploring large models based on MoE (Mixed of Experts) architecture become a new trend in the industry? Can Large Vision Models (LVM) become a new breakthrough in general vision? ...From the 2023 PRO member newsletter of this site released in the past six months, we have selected 10 special interpretations that provide in-depth analysis of technological trends and industrial changes in the above fields to help you achieve your goals in the new year. be prepared. This interpretation comes from Week50 2023

DeepSeekAI Tool User Guide and FAQ DeepSeek is a powerful AI intelligent tool. This article will answer some common usage questions to help you get started quickly. FAQ: The difference between different access methods: There is no difference in function between web version, App version and API calls, and App is just a wrapper for web version. The local deployment uses a distillation model, which is slightly inferior to the full version of DeepSeek-R1, but the 32-bit model theoretically has 90% full version capability. What is a tavern? SillyTavern is a front-end interface that requires calling the AI model through API or Ollama. What is breaking limit

To register for LBank visit the official website and click "Register". Enter your email and password and verify your email. Download the LBank app iOS: Search "LBank" in the AppStore. Download and install the "LBank-DigitalAssetExchange" application. Android: Search for "LBank" in the Google Play Store. Download and install the "LBank-DigitalAssetExchange" application.

AI tools include: Doubao, ChatGPT, Gemini, BlenderBot, etc.
