


Without training, this new method achieves freedom in generating image sizes and resolutions.
Recently, diffusion models have surpassed GAN and autoregressive models and become the mainstream choice for generative models due to their excellent performance. Diffusion model-based text-to-image generation models such as SD, SDXL, Midjourney, and Imagen have demonstrated an amazing ability to generate high-quality images. Typically, these models are trained at a specific resolution to ensure efficient processing and accurate model training on existing hardware.
Figure 1: Comparison of different methods used to generate 2048×2048 images under SDXL 1.0. [1]
In these diffusion models, pattern duplication and severe artifacts often occur. For example, it is shown on the far left side of Figure 1. These problems are particularly acute beyond the training resolution.
In a paper, researchers from the Chinese University of Hong Kong SenseTime Joint Laboratory and other institutions conducted an in-depth study of the convolutional layer of the UNet structure commonly used in diffusion models, and analyzed the frequency FouriScale is proposed from the perspective of domain analysis, as shown in Figure 2.
Figure 2 Schematic diagram of FouriScale’s process (orange line) to ensure consistency across resolutions.
By introducing dilated convolution operations and low-pass filtering operations to replace the original convolutional layers in the pre-trained diffusion model, the structure and scale consistency at different resolutions can be achieved. Combined with the "fill then crop" strategy, this method can flexibly generate images that meet different sizes and aspect ratios. Furthermore, with FouriScale as a guide, this method is able to guarantee complete image structure and excellent image quality when generating high-resolution images of any size. FouriScale does not require any offline prediction calculations and has good compatibility and scalability.
Quantitative and qualitative experimental results demonstrate that FouriScale achieves significant improvements in generating high-resolution images using pre-trained diffusion models.
- Paper address: https://arxiv.org/abs/2403.12963
- Open source code: https://github.com/LeonHLJ/FouriScale
- Paper title: FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Method introduction
1. Atrous convolution ensures structural consistency across resolutions
The denoising network of the diffusion model is usually at a specific resolution. Trained on images or latent space, this network usually adopts U-Net structure. The authors aim to use the parameters of the denoising network during the inference stage to generate higher resolution images without the need for retraining. To avoid structural distortion at inference resolution, the authors try to establish structural consistency between default and high resolutions. For the convolutional layer in U-Net, the structural consistency can be expressed as:
where k is the original convolution kernel and k' is New convolution kernel customized for larger resolutions. According to the frequency domain representation of spatial downsampling, it is as follows:
Formula (3) can be written as:
This formula shows that the Fourier spectrum of the ideal convolution kernel k' should be spliced by the Fourier spectrum of s×s convolution kernels k. In other words, the Fourier spectrum of k' should have periodic repetition, and this repeating pattern is the Fourier spectrum of k.
The widely used dilated convolution just meets this requirement. The frequency domain periodicity of atrous convolution can be expressed by the following formula:
When using a pre-trained diffusion model (training resolution is (h, w)) to generate a high-resolution image of (H, W), the parameters of the atrous convolution Using the original convolution kernel, the expansion factor is (H/h, W/w), which is the ideal convolution kernel k'.
2. Low-pass filtering ensures scale consistency across resolutions
#However, only using hole volumes Product cannot solve the problem perfectly. As shown in the upper left corner of Figure 3, only using atrous convolution still has the phenomenon of pattern repetition in details. The author believes that this is because the frequency aliasing phenomenon of spatial downsampling changes the frequency domain components, resulting in differences in frequency domain distribution at different resolutions. In order to ensure scale consistency across resolutions, they introduced low-pass filtering to filter out high-frequency components to remove the frequency aliasing problem after spatial downsampling. As can be seen from the comparison curve on the right side of Figure 3, after using low-pass filtering, the frequency distribution at high and low resolutions is closer, thus ensuring consistent scale. As can be seen from the lower left corner of Figure 3, after using low-pass filtering, the pattern repetition phenomenon of details has been significantly improved.
Figure 3 (a) Visual comparison with or without low-pass filtering. (b) Fourier relative logarithmic amplitude curve without low-pass filtering. (c) Fourier relative logarithmic amplitude curve with low-pass filtering.
3. Suitable for image generation of any size
The above method can only In order to adapt FouriScale to image generation of any size when the aspect ratio of the generated resolution is consistent with the default inference resolution, the author adopts a "fill and then crop" method. Method 1 shows the combination of this strategy Pseudocode of FouriScale.
4. FouriScale guide
Due to The frequency domain operation in FouriScale inevitably causes loss of detail and undesirable artifacts in the generated images. In order to solve this problem, as shown in Figure 4, the author proposed FouriScale as a guidance method. Specifically, based on the original conditional generation estimation and unconditional generation estimation, they introduced an additional conditional generation estimation. The generation process of this additional conditional generation estimate also uses atrous convolution, but uses a gentler low-pass filtering to ensure that details are not lost. At the same time, they will use the attention score in the conditional generation estimate output by FouriScale to replace the attention score in this additional conditional generation estimate. Since the attention score contains the structural information in the generated image, this operation will correctly The image structure information is introduced while ensuring the image quality.
Figure 4 (a) FouriScale boot diagram. (b) The generated image without using FouriScale as a guide has obvious artifacts and detail errors. (c) Generated image using FouriScale as guide.
Experiment
1. Quantitative test results
The author followed the method of [1] and tested three Vincentian graph models (including SD 1.5, SD 2.1 and SDXL 1.0) to generate four higher resolution images. The tested resolutions were 4x, 6.25x, 8x, and 16x the number of pixels of their respective training resolutions. The results of randomly sampling 30000/10000 image and text pairs on Laion-5B are shown in Table 1:
Table 1 Different training is not required Comparison of quantitative results of methods
Their method achieved optimal results in each pre-trained model and at different resolutions.
2. Qualitative test results
As shown in Figure 5, their method In each pre-trained model, image generation quality and consistent structure can be guaranteed at different resolutions.
Figure 5 Comparison of generated images by different training-free methods
Conclusion
This paper proposes FouriScale to enhance the ability of pre-trained diffusion models to generate high-resolution images. FouriScale is analyzed from the frequency domain and improves the structure and scale consistency at different resolutions through atrous convolution and low-pass filtering operations, solving key challenges such as repeated patterns and structural distortion. Adopting a "fill then crop" strategy and using FouriScale as a guide enhances the flexibility and quality of text-to-image generation while adapting to different aspect ratio generation. Quantitative and qualitative experimental comparisons show that FouriScale can ensure higher image generation quality under different pre-trained models and different resolutions.
The above is the detailed content of Without training, this new method achieves freedom in generating image sizes and resolutions.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Complete Guide to Checking HDFS Configuration in CentOS Systems This article will guide you how to effectively check the configuration and running status of HDFS on CentOS systems. The following steps will help you fully understand the setup and operation of HDFS. Verify Hadoop environment variable: First, make sure the Hadoop environment variable is set correctly. In the terminal, execute the following command to verify that Hadoop is installed and configured correctly: hadoopversion Check HDFS configuration file: The core configuration file of HDFS is located in the /etc/hadoop/conf/ directory, where core-site.xml and hdfs-site.xml are crucial. use

The CentOS shutdown command is shutdown, and the syntax is shutdown [Options] Time [Information]. Options include: -h Stop the system immediately; -P Turn off the power after shutdown; -r restart; -t Waiting time. Times can be specified as immediate (now), minutes ( minutes), or a specific time (hh:mm). Added information can be displayed in system messages.

Backup and Recovery Policy of GitLab under CentOS System In order to ensure data security and recoverability, GitLab on CentOS provides a variety of backup methods. This article will introduce several common backup methods, configuration parameters and recovery processes in detail to help you establish a complete GitLab backup and recovery strategy. 1. Manual backup Use the gitlab-rakegitlab:backup:create command to execute manual backup. This command backs up key information such as GitLab repository, database, users, user groups, keys, and permissions. The default backup file is stored in the /var/opt/gitlab/backups directory. You can modify /etc/gitlab

Installing MySQL on CentOS involves the following steps: Adding the appropriate MySQL yum source. Execute the yum install mysql-server command to install the MySQL server. Use the mysql_secure_installation command to make security settings, such as setting the root user password. Customize the MySQL configuration file as needed. Tune MySQL parameters and optimize databases for performance.

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

A complete guide to viewing GitLab logs under CentOS system This article will guide you how to view various GitLab logs in CentOS system, including main logs, exception logs, and other related logs. Please note that the log file path may vary depending on the GitLab version and installation method. If the following path does not exist, please check the GitLab installation directory and configuration files. 1. View the main GitLab log Use the following command to view the main log file of the GitLabRails application: Command: sudocat/var/log/gitlab/gitlab-rails/production.log This command will display product

PyTorch distributed training on CentOS system requires the following steps: PyTorch installation: The premise is that Python and pip are installed in CentOS system. Depending on your CUDA version, get the appropriate installation command from the PyTorch official website. For CPU-only training, you can use the following command: pipinstalltorchtorchvisiontorchaudio If you need GPU support, make sure that the corresponding version of CUDA and cuDNN are installed and use the corresponding PyTorch version for installation. Distributed environment configuration: Distributed training usually requires multiple machines or single-machine multiple GPUs. Place

Enable PyTorch GPU acceleration on CentOS system requires the installation of CUDA, cuDNN and GPU versions of PyTorch. The following steps will guide you through the process: CUDA and cuDNN installation determine CUDA version compatibility: Use the nvidia-smi command to view the CUDA version supported by your NVIDIA graphics card. For example, your MX450 graphics card may support CUDA11.1 or higher. Download and install CUDAToolkit: Visit the official website of NVIDIACUDAToolkit and download and install the corresponding version according to the highest CUDA version supported by your graphics card. Install cuDNN library:
