Table of Contents
Method introduction
Experiment
Conclusion
Home Technology peripherals AI Without training, this new method achieves freedom in generating image sizes and resolutions.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Apr 08, 2024 pm 04:52 PM
ai train

Recently, diffusion models have surpassed GAN and autoregressive models and become the mainstream choice for generative models due to their excellent performance. Diffusion model-based text-to-image generation models such as SD, SDXL, Midjourney, and Imagen have demonstrated an amazing ability to generate high-quality images. Typically, these models are trained at a specific resolution to ensure efficient processing and accurate model training on existing hardware.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 1: Comparison of different methods used to generate 2048×2048 images under SDXL 1.0. [1]

In these diffusion models, pattern duplication and severe artifacts often occur. For example, it is shown on the far left side of Figure 1. These problems are particularly acute beyond the training resolution.

In a paper, researchers from the Chinese University of Hong Kong SenseTime Joint Laboratory and other institutions conducted an in-depth study of the convolutional layer of the UNet structure commonly used in diffusion models, and analyzed the frequency FouriScale is proposed from the perspective of domain analysis, as shown in Figure 2.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 2 Schematic diagram of FouriScale’s process (orange line) to ensure consistency across resolutions.

By introducing dilated convolution operations and low-pass filtering operations to replace the original convolutional layers in the pre-trained diffusion model, the structure and scale consistency at different resolutions can be achieved. Combined with the "fill then crop" strategy, this method can flexibly generate images that meet different sizes and aspect ratios. Furthermore, with FouriScale as a guide, this method is able to guarantee complete image structure and excellent image quality when generating high-resolution images of any size. FouriScale does not require any offline prediction calculations and has good compatibility and scalability.

Quantitative and qualitative experimental results demonstrate that FouriScale achieves significant improvements in generating high-resolution images using pre-trained diffusion models.

Without training, this new method achieves freedom in generating image sizes and resolutions.


  • Paper address: https://arxiv.org/abs/2403.12963
  • Open source code: https://github.com/LeonHLJ/FouriScale
  • Paper title: FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Method introduction

1. Atrous convolution ensures structural consistency across resolutions

The denoising network of the diffusion model is usually at a specific resolution. Trained on images or latent space, this network usually adopts U-Net structure. The authors aim to use the parameters of the denoising network during the inference stage to generate higher resolution images without the need for retraining. To avoid structural distortion at inference resolution, the authors try to establish structural consistency between default and high resolutions. For the convolutional layer in U-Net, the structural consistency can be expressed as:

Without training, this new method achieves freedom in generating image sizes and resolutions.

where k is the original convolution kernel and k' is New convolution kernel customized for larger resolutions. According to the frequency domain representation of spatial downsampling, it is as follows:

Without training, this new method achieves freedom in generating image sizes and resolutions.

Formula (3) can be written as:

Without training, this new method achieves freedom in generating image sizes and resolutions.

This formula shows that the Fourier spectrum of the ideal convolution kernel k' should be spliced ​​by the Fourier spectrum of s×s convolution kernels k. In other words, the Fourier spectrum of k' should have periodic repetition, and this repeating pattern is the Fourier spectrum of k.

The widely used dilated convolution just meets this requirement. The frequency domain periodicity of atrous convolution can be expressed by the following formula:

Without training, this new method achieves freedom in generating image sizes and resolutions.

When using a pre-trained diffusion model (training resolution is (h, w)) to generate a high-resolution image of (H, W), the parameters of the atrous convolution Using the original convolution kernel, the expansion factor is (H/h, W/w), which is the ideal convolution kernel k'.

2. Low-pass filtering ensures scale consistency across resolutions

#However, only using hole volumes Product cannot solve the problem perfectly. As shown in the upper left corner of Figure 3, only using atrous convolution still has the phenomenon of pattern repetition in details. The author believes that this is because the frequency aliasing phenomenon of spatial downsampling changes the frequency domain components, resulting in differences in frequency domain distribution at different resolutions. In order to ensure scale consistency across resolutions, they introduced low-pass filtering to filter out high-frequency components to remove the frequency aliasing problem after spatial downsampling. As can be seen from the comparison curve on the right side of Figure 3, after using low-pass filtering, the frequency distribution at high and low resolutions is closer, thus ensuring consistent scale. As can be seen from the lower left corner of Figure 3, after using low-pass filtering, the pattern repetition phenomenon of details has been significantly improved.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 3 (a) Visual comparison with or without low-pass filtering. (b) Fourier relative logarithmic amplitude curve without low-pass filtering. (c) Fourier relative logarithmic amplitude curve with low-pass filtering.

3. Suitable for image generation of any size

The above method can only In order to adapt FouriScale to image generation of any size when the aspect ratio of the generated resolution is consistent with the default inference resolution, the author adopts a "fill and then crop" method. Method 1 shows the combination of this strategy Pseudocode of FouriScale.

Without training, this new method achieves freedom in generating image sizes and resolutions.

4. FouriScale guide

Due to The frequency domain operation in FouriScale inevitably causes loss of detail and undesirable artifacts in the generated images. In order to solve this problem, as shown in Figure 4, the author proposed FouriScale as a guidance method. Specifically, based on the original conditional generation estimation and unconditional generation estimation, they introduced an additional conditional generation estimation. The generation process of this additional conditional generation estimate also uses atrous convolution, but uses a gentler low-pass filtering to ensure that details are not lost. At the same time, they will use the attention score in the conditional generation estimate output by FouriScale to replace the attention score in this additional conditional generation estimate. Since the attention score contains the structural information in the generated image, this operation will correctly The image structure information is introduced while ensuring the image quality.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 4 (a) FouriScale boot diagram. (b) The generated image without using FouriScale as a guide has obvious artifacts and detail errors. (c) Generated image using FouriScale as guide.

Experiment

1. Quantitative test results

The author followed the method of [1] and tested three Vincentian graph models (including SD 1.5, SD 2.1 and SDXL 1.0) to generate four higher resolution images. The tested resolutions were 4x, 6.25x, 8x, and 16x the number of pixels of their respective training resolutions. The results of randomly sampling 30000/10000 image and text pairs on Laion-5B are shown in Table 1:

Without training, this new method achieves freedom in generating image sizes and resolutions.

Table 1 Different training is not required Comparison of quantitative results of methods

Their method achieved optimal results in each pre-trained model and at different resolutions.

2. Qualitative test results

As shown in Figure 5, their method In each pre-trained model, image generation quality and consistent structure can be guaranteed at different resolutions.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 5 Comparison of generated images by different training-free methods

Conclusion

This paper proposes FouriScale to enhance the ability of pre-trained diffusion models to generate high-resolution images. FouriScale is analyzed from the frequency domain and improves the structure and scale consistency at different resolutions through atrous convolution and low-pass filtering operations, solving key challenges such as repeated patterns and structural distortion. Adopting a "fill then crop" strategy and using FouriScale as a guide enhances the flexibility and quality of text-to-image generation while adapting to different aspect ratio generation. Quantitative and qualitative experimental comparisons show that FouriScale can ensure higher image generation quality under different pre-trained models and different resolutions.

The above is the detailed content of Without training, this new method achieves freedom in generating image sizes and resolutions.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to check CentOS HDFS configuration How to check CentOS HDFS configuration Apr 14, 2025 pm 07:21 PM

Complete Guide to Checking HDFS Configuration in CentOS Systems This article will guide you how to effectively check the configuration and running status of HDFS on CentOS systems. The following steps will help you fully understand the setup and operation of HDFS. Verify Hadoop environment variable: First, make sure the Hadoop environment variable is set correctly. In the terminal, execute the following command to verify that Hadoop is installed and configured correctly: hadoopversion Check HDFS configuration file: The core configuration file of HDFS is located in the /etc/hadoop/conf/ directory, where core-site.xml and hdfs-site.xml are crucial. use

Centos shutdown command line Centos shutdown command line Apr 14, 2025 pm 09:12 PM

The CentOS shutdown command is shutdown, and the syntax is shutdown [Options] Time [Information]. Options include: -h Stop the system immediately; -P Turn off the power after shutdown; -r restart; -t Waiting time. Times can be specified as immediate (now), minutes ( minutes), or a specific time (hh:mm). Added information can be displayed in system messages.

What are the backup methods for GitLab on CentOS What are the backup methods for GitLab on CentOS Apr 14, 2025 pm 05:33 PM

Backup and Recovery Policy of GitLab under CentOS System In order to ensure data security and recoverability, GitLab on CentOS provides a variety of backup methods. This article will introduce several common backup methods, configuration parameters and recovery processes in detail to help you establish a complete GitLab backup and recovery strategy. 1. Manual backup Use the gitlab-rakegitlab:backup:create command to execute manual backup. This command backs up key information such as GitLab repository, database, users, user groups, keys, and permissions. The default backup file is stored in the /var/opt/gitlab/backups directory. You can modify /etc/gitlab

Centos install mysql Centos install mysql Apr 14, 2025 pm 08:09 PM

Installing MySQL on CentOS involves the following steps: Adding the appropriate MySQL yum source. Execute the yum install mysql-server command to install the MySQL server. Use the mysql_secure_installation command to make security settings, such as setting the root user password. Customize the MySQL configuration file as needed. Tune MySQL parameters and optimize databases for performance.

Detailed explanation of docker principle Detailed explanation of docker principle Apr 14, 2025 pm 11:57 PM

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

How to view GitLab logs under CentOS How to view GitLab logs under CentOS Apr 14, 2025 pm 06:18 PM

A complete guide to viewing GitLab logs under CentOS system This article will guide you how to view various GitLab logs in CentOS system, including main logs, exception logs, and other related logs. Please note that the log file path may vary depending on the GitLab version and installation method. If the following path does not exist, please check the GitLab installation directory and configuration files. 1. View the main GitLab log Use the following command to view the main log file of the GitLabRails application: Command: sudocat/var/log/gitlab/gitlab-rails/production.log This command will display product

How to operate distributed training of PyTorch on CentOS How to operate distributed training of PyTorch on CentOS Apr 14, 2025 pm 06:36 PM

PyTorch distributed training on CentOS system requires the following steps: PyTorch installation: The premise is that Python and pip are installed in CentOS system. Depending on your CUDA version, get the appropriate installation command from the PyTorch official website. For CPU-only training, you can use the following command: pipinstalltorchtorchvisiontorchaudio If you need GPU support, make sure that the corresponding version of CUDA and cuDNN are installed and use the corresponding PyTorch version for installation. Distributed environment configuration: Distributed training usually requires multiple machines or single-machine multiple GPUs. Place

How is the GPU support for PyTorch on CentOS How is the GPU support for PyTorch on CentOS Apr 14, 2025 pm 06:48 PM

Enable PyTorch GPU acceleration on CentOS system requires the installation of CUDA, cuDNN and GPU versions of PyTorch. The following steps will guide you through the process: CUDA and cuDNN installation determine CUDA version compatibility: Use the nvidia-smi command to view the CUDA version supported by your NVIDIA graphics card. For example, your MX450 graphics card may support CUDA11.1 or higher. Download and install CUDAToolkit: Visit the official website of NVIDIACUDAToolkit and download and install the corresponding version according to the highest CUDA version supported by your graphics card. Install cuDNN library:

See all articles