Home Technology peripherals AI Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Apr 25, 2024 pm 05:25 PM
git ByteDance industry diffusion model hyper-sd

Recently, Diffusion Model has made significant progress in the field of image generation, bringing unprecedented development opportunities to image generation and video generation tasks. Despite the impressive results, the multi-step iterative denoising properties inherent in the inference process of diffusion models result in high computational costs. Recently, a series of diffusion model distillation algorithms have emerged to accelerate the inference process of diffusion models. These methods can be roughly divided into two categories: i) trajectory-preserving distillation; ii) trajectory reconstruction distillation. However, these two types of methods are limited by the limited effect ceiling or changes in the output domain.

In order to solve these problems, the ByteDance technical team proposed a trajectory segmentation consistency model called Hyper-SD. Hyper-SD's open source has also been recognized by Huggingface CEO Clem Delangue.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

This model is a novel diffusion model distillation framework that combines the advantages of trajectory-preserving distillation and trajectory reconstruction distillation, while compressing the number of denoising steps. while maintaining near-lossless performance. Compared with existing diffusion model acceleration algorithms, this method achieves excellent acceleration results. Verified by extensive experiments and user reviews, Hyper-SD can achieve SOTA-level image generation performance in 1 to 8 steps of generation on both SDXL and SD1.5 architectures.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

  • Project homepage: https://hyper-sd.github.io/

  • Paper link : https://arxiv.org/abs/2404.13686

  • Huggingface Link: https://huggingface.co/ByteDance/Hyper-SD

  • Single-step generated Demo link: https://huggingface.co/spaces/ByteDance/Hyper-SDXL-1Step-T2I

  • Real-time drawing board Demo link: https://huggingface. co/spaces/ByteDance/Hyper-SD15-ScribbleAccelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

##Introduction
Existing distillation methods for diffusion model acceleration can be roughly divided into two categories: trajectory-preserving distillation and trajectory reconstruction distillation. The trajectory-preserving distillation technique aims to maintain the original trajectory of the ordinary differential equation (ODE) corresponding to diffusion. The principle is to reduce inference steps by forcing the distilled model and the original model to produce similar outputs. However, it should be noted that although acceleration can be achieved, such methods may lead to a decrease in generation quality due to limited model capacity and inevitable errors in the training and fitting process. In contrast, trajectory reconstruction methods directly use the endpoints on the trajectory or real images as the main source of supervision, ignoring the intermediate steps of the trajectory, and can reduce the number of inference steps by reconstructing more effective trajectories and perform in a limited time. Explore the potential of your model within steps, freeing it from the constraints of the original trajectory. However, this often results in the output domain of the accelerated model being inconsistent with the original model, resulting in suboptimal results.

This paper proposes a trajectory segmentation consistency model (Hyper-SD for short) that combines the advantages of trajectory preservation and reconstruction strategies. Specifically, the algorithm first introduces trajectory segmentation consistency distillation to enforce consistency within each segment and gradually reduces the number of segments to achieve full-time consistency. This strategy solves the problem of suboptimal performance of consistent models due to insufficient model fitting capabilities and accumulation of inference errors. Subsequently, the algorithm uses human feedback learning (RLHF) to improve the model generation effect to make up for the loss of model generation effect during the acceleration process and make it better adapted to low-step reasoning. Finally, the algorithm uses fractional distillation to enhance one-step generation performance and achieves an idealized full-time-step consistent diffusion model through unified LORA, achieving excellent results in generation effects.

Method

1. Trajectory segmentation consistency distillation

Consistency Distillation (CD) [24] and Consistency Trajectory Model (CTM) [4] both aim to transform the diffusion model into a consistency model for the entire time step range [0, T] through one-shot distillation. However, these distillation models often fail to achieve optimality due to limitations in model fitting capabilities. Inspired by the soft consistency objective introduced in CTM, we refine the training process by dividing the entire time step range [0, T] into k segments and performing piecewise consistent model distillation step by step.

In the first stage, we set k=8 and use the original diffusion model to initialize Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source and Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source. The starting time step Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source is uniformly randomly sampled from Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source. We then sample the end time step Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source, where Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source is calculated as follows:

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

The training loss is calculated as follows:

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Where Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source is calculated by formula 3, Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source represents the exponential moving average (EMA) of the student model.

Subsequently, we restore the model weights from the previous stage and continue trainingAccelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source, gradually reducing k to [4,2,1]. It is worth noting that k=1 corresponds to the standard CTM training scheme. For the distance metric d, we employ a mixture of adversarial loss and mean squared error (MSE) loss. In experiments, we observed that the MSE loss is more effective when the predicted and target values ​​are close (e.g., for k=8, 4), while the adversarial loss increases as the difference between the predicted and target values ​​increases. becomes more precise (for example, for k=2, 1). Therefore, we dynamically increase the weight of the adversarial loss and decrease the weight of the MSE loss throughout the training phase. In addition, we also integrate a noise perturbation mechanism to enhance training stability. Take the two-stage Trajectory Segment Consensus Distillation (TSCD) process as an example. As shown in the figure below, we perform independent consistency distillation in the Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source and Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source time periods in the first stage, and then perform global consistency trajectory distillation based on the consistency distillation results of the previous two periods.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

The complete algorithm process is as follows:

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

2. Human feedback learning

In addition to distillation, we further incorporate feedback learning to improve the performance of the accelerated diffusion model. Specifically, we improve the generation quality of accelerated models by leveraging human aesthetic preferences and feedback from existing visual perception models. For aesthetic feedback, we utilize the LAION aesthetic predictor and the aesthetic preference reward model provided in ImageReward to guide the model to generate more aesthetic images, as follows:

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Where Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source is the aesthetic reward model, including the aesthetic predictor of the LAION dataset and ImageReward model, c is the text prompt, Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source is used together with the ReLU function as the hinge loss. In addition to feedback from aesthetic preferences, we note that existing visual perception models embedding rich prior knowledge about images can also serve as good feedback providers. Empirically, we find that instance segmentation models can guide the model to generate well-structured objects. Specifically, we first diffuse the noise on image Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source in the latent space to Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source, after which, similar to ImageReward, we perform iterative denoising until a specific time step Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source and predict Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source directly. We then leverage a perceptual instance segmentation model to evaluate the performance of structure generation by examining the difference between instance segmentation annotations for real images and instance segmentation predictions for denoised images, as follows:

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Where Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source is the instance segmentation model (such as SOLO). Instance segmentation models can more accurately capture the structural defects of generated images and provide more targeted feedback signals. It is worth noting that in addition to instance segmentation models, other perceptual models are also applicable. These perceptual models can serve as complementary feedback to subjective aesthetics, focusing more on objective generative quality. Therefore, the diffusion model we use to optimize the feedback signal can be defined as:

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

3. One-step generation of enhanced

due to consistency loss Inherent limitations, one-step generation within the consistency model framework are not ideal. As analyzed in CM, the consistent distillation model shows excellent accuracy in guiding the trajectory endpoint Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source at position Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source. Therefore, fractional distillation is a suitable and effective method to further improve the one-step generation effect of our TSCD model. Specifically, we advance further generation through an optimized distribution matching distillation (DMD) technique. DMD enhances the model's output by utilizing two different scoring functions: distribution Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source from the teacher model and Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source from the fake model. We combine mean squared error (MSE) loss with score-based distillation to improve training stability. In this process, the aforementioned human feedback learning techniques are also integrated to fine-tune our model to effectively generate images with high fidelity.

By integrating these strategies, our method can not only achieve excellent low-step inference results on both SD1.5 and SDXL (and without Classifier-Guidance), but also achieve an ideal global consistency model without targeting Each specific number of steps trains UNet or LoRA to implement a unified low-step reasoning model.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Experiment

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

On SD1.5 and SDXL and the current existing A quantitative comparison of acceleration algorithms shows that Hyper-SD is significantly better than the current state-of-the-art methods

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

In addition, Hyper-SD can use one model to implement various Different from low-step inference, the above quantitative indicators also show the effect of our method when using unified model inference.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

The visualization of the acceleration effect on SD1.5 and SDXL intuitively demonstrates the superiority of Hyper-SD in accelerating diffusion model inference sex.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

A large number of User-Study also shows the superiority of Hyper-SD compared to various existing acceleration algorithms.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

The accelerated LoRA trained by Hyper-SD is well compatible with different styles of Vincent figure base models.

Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source

At the same time, Hyper-SD’s LoRA can also adapt to the existing ControlNet to achieve high-quality controllable image generation at a low number of steps.

Summary

The paper proposes Hyper-SD, a unified diffusion model acceleration framework that can significantly improve the generation ability of diffusion models in low-step situations. , realizing new SOTA performance based on SDXL and SD15. This method uses trajectory segmentation consistency distillation to enhance the trajectory preservation capability during the distillation process and achieve a generation effect close to the original model. Then, the potential of the model at extremely low step counts is improved by further leveraging human feedback learning and variational fractional distillation, resulting in more optimized and efficient model generation. The paper also open sourced the Lora plug-in for SDXL and SD15 from 1 to 8 steps inference, as well as a dedicated one-step SDXL model, aiming to further promote the development of the generative AI community.

The above is the detailed content of Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to install deepseek How to install deepseek Feb 19, 2025 pm 05:48 PM

There are many ways to install DeepSeek, including: compile from source (for experienced developers) using precompiled packages (for Windows users) using Docker containers (for most convenient, no need to worry about compatibility) No matter which method you choose, Please read the official documents carefully and prepare them fully to avoid unnecessary trouble.

Summary of FAQs for DeepSeek usage Summary of FAQs for DeepSeek usage Feb 19, 2025 pm 03:45 PM

DeepSeekAI Tool User Guide and FAQ DeepSeek is a powerful AI intelligent tool. This article will answer some common usage questions to help you get started quickly. FAQ: The difference between different access methods: There is no difference in function between web version, App version and API calls, and App is just a wrapper for web version. The local deployment uses a distillation model, which is slightly inferior to the full version of DeepSeek-R1, but the 32-bit model theoretically has 90% full version capability. What is a tavern? SillyTavern is a front-end interface that requires calling the AI ​​model through API or Ollama. What is breaking limit

What are the AI ​​tools? What are the AI ​​tools? Nov 29, 2024 am 11:11 AM

AI tools include: Doubao, ChatGPT, Gemini, BlenderBot, etc.

What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory Mar 05, 2025 pm 12:33 PM

Grayscale Investment: The channel for institutional investors to enter the cryptocurrency market. Grayscale Investment Company provides digital currency investment services to institutions and investors. It allows investors to indirectly participate in cryptocurrency investment through the form of trust funds. The company has launched several crypto trusts, which has attracted widespread market attention, but the impact of these funds on token prices varies significantly. This article will introduce in detail some of Grayscale's major crypto trust funds. Grayscale Major Crypto Trust Funds Available at a glance Grayscale Investment (founded by DigitalCurrencyGroup in 2013) manages a variety of crypto asset trust funds, providing institutional investors and high-net-worth individuals with compliant investment channels. Its main funds include: Zcash (ZEC), SOL,

Delphi Digital: How to change the new AI economy by parsing the new ElizaOS v2 architecture? Delphi Digital: How to change the new AI economy by parsing the new ElizaOS v2 architecture? Mar 04, 2025 pm 07:00 PM

ElizaOSv2: Empowering AI and leading the new economy of Web3. AI is evolving from auxiliary tools to independent entities. ElizaOSv2 plays a key role in it, which gives AI the ability to manage funds and operate Web3 businesses. This article will dive into the key innovations of ElizaOSv2 and how it shapes an AI-driven future economy. AI Automation: Going to independently operate ElizaOS was originally an AI framework focusing on Web3 automation. v1 version allows AI to interact with smart contracts and blockchain data, while v2 version achieves significant performance improvements. Instead of just executing simple instructions, AI can independently manage workflows, operate business and develop financial strategies. Architecture upgrade: Enhanced A

As top market makers enter the crypto market, what impact will Castle Securities have on the industry? As top market makers enter the crypto market, what impact will Castle Securities have on the industry? Mar 04, 2025 pm 08:03 PM

The entry of top market maker Castle Securities into Bitcoin market maker is a symbol of the maturity of the Bitcoin market and a key step for traditional financial forces to compete for future asset pricing power. At the same time, for retail investors, it may mean the gradual weakening of their voice. On February 25, according to Bloomberg, Citadel Securities is seeking to become a liquidity provider for cryptocurrencies. The company aims to join the list of market makers on various exchanges, including exchanges operated by CoinbaseGlobal, BinanceHoldings and Crypto.com, people familiar with the matter said. Once approved by the exchange, the company initially planned to set up a market maker team outside the United States. This move is not only a sign

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models Mar 12, 2025 pm 01:03 PM

Researchers from Shanghai Jiaotong University, Shanghai AILab and the Chinese University of Hong Kong have launched the Visual-RFT (Visual Enhancement Fine Tuning) open source project, which requires only a small amount of data to significantly improve the performance of visual language big model (LVLM). Visual-RFT cleverly combines DeepSeek-R1's rule-based reinforcement learning approach with OpenAI's reinforcement fine-tuning (RFT) paradigm, successfully extending this approach from the text field to the visual field. By designing corresponding rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT overcomes the limitations of the DeepSeek-R1 method being limited to text, mathematical reasoning and other fields, providing a new way for LVLM training. Vis

Bitwise: Businesses Buy Bitcoin A Neglected Big Trend Bitwise: Businesses Buy Bitcoin A Neglected Big Trend Mar 05, 2025 pm 02:42 PM

Weekly Observation: Businesses Hoarding Bitcoin – A Brewing Change I often point out some overlooked market trends in weekly memos. MicroStrategy's move is a stark example. Many people may say, "MicroStrategy and MichaelSaylor are already well-known, what are you going to pay attention to?" This is true, but many investors regard it as a special case and ignore the deeper market forces behind it. This view is one-sided. In-depth research on the adoption of Bitcoin as a reserve asset in recent months shows that this is not an isolated case, but a major trend that is emerging. I predict that in the next 12-18 months, hundreds of companies will follow suit and buy large quantities of Bitcoin

See all articles