Home Technology peripherals AI Google is optimizing the diffusion model. Samsung mobile phones run Stable Diffusion and produce images in 12 seconds.

Google is optimizing the diffusion model. Samsung mobile phones run Stable Diffusion and produce images in 12 seconds.

Apr 28, 2023 am 08:19 AM
Google Model

Stable Diffusion is as well-known in the field of image generation as ChatGPT in the conversation large model. It is capable of creating realistic images of any given input text in tens of seconds. Because Stable Diffusion has more than 1 billion parameters, and due to limited computing and memory resources on the device, this model is primarily run in the cloud.

Without careful design and implementation, running these models on a device may result in increased latency due to the iterative denoising process and excessive memory consumption.

How to run Stable Diffusion on the device has aroused everyone's research interest. Previously, some researchers developed an application that uses Stable Diffusion to generate images on the iPhone 14 Pro. Takes one minute and uses approximately 2GiB of application memory.

Apple has also made some optimizations to this before. They can generate an image with a resolution of 512x512 in half a minute on iPhone, iPad, Mac and other devices. Qualcomm follows closely behind, running Stable Diffusion v1.5 on Android phones, generating images with a resolution of 512x512 in less than 15 seconds.

Recently, in a paper published by Google "Speed ​​Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations", they implemented a GPU-driven Stable Diffusion 1.4 is run on the device, achieving SOTA inference latency performance (on Samsung S23 Ultra, it only takes 11.5 seconds to generate a 512 × 512 image through 20 iterations). Furthermore, this study is not specific to one device; rather, it is a general approach applicable to improving all potential diffusion models.

This research opens up many possibilities for running generative AI locally on your phone, without a data connection or cloud server. Stable Diffusion was only released last fall, and it can already be plugged into devices and run today, which shows how fast this field is developing.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

##Paper address: https://arxiv.org/pdf/2304.11267.pdf

In order to achieve this generation speed, Google has put forward some optimization suggestions. Let’s take a look at how Google optimizes.

Method introduction

This research aims to propose optimization methods to improve the speed of large-scale diffusion model Vincentian diagrams. Some optimization suggestions are proposed for Stable Diffusion. These optimization suggestions are also Suitable for other large diffusion models.

First let’s take a look at the main components of Stable Diffusion, including: text embedder (text embedder), noise generation (noise generation), denoising neural network (denoising neural network) and Image decoder (image decoder, as shown in Figure 1 below.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

##Then let’s take a closer look at the three issues proposed in this study. An optimization method.

Specialized kernel: Group Norm and GELU

Group Normalization (GN) method The working principle is to divide the channels of the feature map into smaller groups and normalize each group independently, thus making GN less dependent on batch size and more suitable for various batch sizes and network architectures. . Instead of performing reshape, mean, variance, and normalization operations in sequence, this research designed a unique GPU shader form of kernel that can perform all these operations in one GPU command without any intermediate Tensor.

Gaussian error linear unit (GELU), as a commonly used model activation function, contains a large number of numerical calculations, such as multiplication, addition and Gaussian error function. This study uses a A dedicated shader to integrate these numerical calculations and their accompanying split and multiplication operations so that they can be performed in a single AI paint call.

Improving the efficiency of the attention module

The text-to-image transformer in Stable Diffusion helps model conditional distributions, which is crucial for text-to-image generation tasks. However, self/cross-attention mechanisms encounter difficulties in processing long sequences due to memory complexity and time complexity. Based on this, this study proposes two optimization methods to alleviate the computational bottleneck.

On the one hand, in order to avoid performing the entire softmax calculation on a large matrix, this study uses a GPU shader to reduce computational operations, which greatly reduces the memory footprint and overall latency of the intermediate tensor. The specific method is shown in Figure 2 below.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

On the other hand, this study uses FlashAttention [7], an IO-aware precise attention algorithm, which enables high Bandwidth Memory (HBM) requires fewer accesses than standard attention mechanisms, improving overall efficiency.

Winograd Convolution

Winograd convolution converts the convolution operation into a series of matrix multiplications. This method can reduce many multiplication operations and improve calculation efficiency. However, this also increases memory consumption and numerical errors, especially when using larger tiles.

The backbone of Stable Diffusion relies heavily on 3×3 convolutional layers, especially in the image decoder, where they account for 90%. This study provides an in-depth analysis of this phenomenon to explore the potential benefits of using Winograd with different tile sizes on 3 × 3 kernel convolutions. Research has found that a tile size of 4 × 4 is optimal as it provides the best balance between computational efficiency and memory utilization.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

Experimentation

The study was benchmarked on a variety of devices: Samsung S23 Ultra (Adreno 740) and iPhone 14 Pro Max (A16). The benchmark results are shown in Table 1 below:

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

It is obvious that as each optimization is activated, the latency gradually decreases (It can be understood that the time to generate images is reduced). Specifically, compared to the baseline: 52.2% latency reduction on Samsung S23 Ultra; 32.9% latency reduction on iPhone 14 Pro Max. In addition, the study also evaluates the end-to-end latency of Samsung S23 Ultra, generating a 512 × 512 pixel image within 20 denoising iteration steps, achieving SOTA results in less than 12 seconds.

Small devices can run their own generative artificial intelligence models. What does this mean for the future? We can expect a wave.

The above is the detailed content of Google is optimizing the diffusion model. Samsung mobile phones run Stable Diffusion and produce images in 12 seconds.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to comment deepseek How to comment deepseek Feb 19, 2025 pm 05:42 PM

DeepSeek is a powerful information retrieval tool. Its advantage is that it can deeply mine information, but its disadvantages are that it is slow, the result presentation method is simple, and the database coverage is limited. It needs to be weighed according to specific needs.

How to search deepseek How to search deepseek Feb 19, 2025 pm 05:39 PM

DeepSeek is a proprietary search engine that only searches in a specific database or system, faster and more accurate. When using it, users are advised to read the document, try different search strategies, seek help and feedback on the user experience in order to make the most of their advantages.

Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Feb 28, 2025 am 11:06 AM

This article introduces the registration process of the Sesame Open Exchange (Gate.io) web version and the Gate trading app in detail. Whether it is web registration or app registration, you need to visit the official website or app store to download the genuine app, then fill in the user name, password, email, mobile phone number and other information, and complete email or mobile phone verification.

Why can't the Bybit exchange link be directly downloaded and installed? Why can't the Bybit exchange link be directly downloaded and installed? Feb 21, 2025 pm 10:57 PM

Why can’t the Bybit exchange link be directly downloaded and installed? Bybit is a cryptocurrency exchange that provides trading services to users. The exchange's mobile apps cannot be downloaded directly through AppStore or GooglePlay for the following reasons: 1. App Store policy restricts Apple and Google from having strict requirements on the types of applications allowed in the app store. Cryptocurrency exchange applications often do not meet these requirements because they involve financial services and require specific regulations and security standards. 2. Laws and regulations Compliance In many countries, activities related to cryptocurrency transactions are regulated or restricted. To comply with these regulations, Bybit Application can only be used through official websites or other authorized channels

Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Mar 04, 2025 pm 11:48 PM

A detailed introduction to the login operation of the Sesame Open Exchange web version, including login steps and password recovery process. It also provides solutions to common problems such as login failure, unable to open the page, and unable to receive verification codes to help you log in to the platform smoothly.

Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Feb 28, 2025 am 10:51 AM

It is crucial to choose a formal channel to download the app and ensure the safety of your account.

Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Mar 18, 2025 pm 12:15 PM

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Binance binance official website latest version login portal Binance binance official website latest version login portal Feb 21, 2025 pm 05:42 PM

To access the latest version of Binance website login portal, just follow these simple steps. Go to the official website and click the "Login" button in the upper right corner. Select your existing login method. If you are a new user, please "Register". Enter your registered mobile number or email and password and complete authentication (such as mobile verification code or Google Authenticator). After successful verification, you can access the latest version of Binance official website login portal.

See all articles