After the emergence of the AI application Stable Diffusion, it quickly became popular in a very short period of time and became the "most beautiful lady" generation tool passed down by many players. However, when this AI computing and drawing tool was officially launched by Stable Diffusion, both the deployment of the WEB UI and the training and generation of the model were basically based on the NVIDIA CUDA accelerated algorithm, so the A card was not officially supported by Stable Diffusion at first. Fortunately, Stable Diffusion is an open source product. With the support of many community players, many branches have emerged that support A-card GPU accelerated computing. For example, the DirectML-based deployment we are going to test today can implement AMD Radeon graphics cards. Stable Diffusion AI computing hardware acceleration.
If you want to deploy DirectML-based Stable Diffusion locally, it is relatively more complicated than the convenient CUDA-based WEB UI deployment. However, there are already relatively mature integration packages online for players to apply. Players only need to download The corresponding integration package can realize local fool-proof installation and deployment with one click, which can save everyone a lot of time.
▲After the integration package we chose for testing is installed, it automatically enters the AMD GPU accelerated computing mode. Open http://127.0.0.1:7860 locally to open the local Stable Diffusion AI drawing interface.
▲You can freely set relevant AI art drawing generation parameters on the local WEB UI interface, and click "Generate" to start drawing. Please refer to the relevant online tutorials for details, we will not go into details here.
So, can the Stable Diffusion deployment branch based on DirectML support hardware computing acceleration for AMD graphics cards? How efficient is it? In the past, A-card players could only run Stable Diffusion under the Linux system and simulate CUDA acceleration through ROCM (Radeon Open Compute). Now, can the AI accelerated calculation of the A-card be directly implemented under the Windows system? Can we achieve our expected goals? To this end, we selected several AMD Radeon RX 5000 series, RX 6000 series and RX 7000 series graphics cards and conducted a detailed experience.
Experience Platform
Graphics card: AMD Radeon RX 5500XT (8GB), RX 5700 (8GB), RX 6500XT 4GB, RX 6600 (8GB), RX 6700XT (12GB), RX 6750XT (12GB), RX 6800 (16GB), RX 6900XT ( 16GB), RX 7900 XT (20GB), RX 7900 XTX (24GB)
CPU: Intel Core i9-13900K
Motherboard: Intel Z790
Memory: DDR5 6000 16GB×2
SSD: AORUS NVMe PCIe SSD 2TB
Operating system: Windows 11 Pro 22H2
Driver: AMD Software Adrenalin Edition 23.4.3
By testing, we want to know:
How much difference is there in the AI drawing computing power of Stable Diffusion between AMD Radeon 5000 series, 6000 series and 7000 series?
Compared with traditional CPU AI computing acceleration, how is the acceleration performance of AMD GPU?
▲The network open source sharing deployment solution we adopted can correctly implement the hardware acceleration calculation of AMD graphics cards. It can be seen that the GPU occupancy rate has been maintained at 100% during the image generation process.
The model is Novel AI Final-runed (CKPT)
In the first part of the test, we used keywords to generate a big-eyed, fashionable and beautiful lady, who also had a certain sense of photo. The keyword settings are as follows (some are quoted from open source shared keywords on the Internet):
lora:koreanDollLikeness_v15:0.6> , best quality, ultra high res, (photorealistic:1.4), 1woman, sleeveless white button shirt, black skirt, black choker, cute, (Kpop idol), (aegyo sal:1) , (platinum blonde hair:1), ((puffy eyes)), looking at viewer, full body, facing front,fashion,premium
Resolution setting: 512×512
Sampling step: 20
Prompt word guidance coefficient: 7
Generate batches - quantity per batch: 1-1, 4-1
In the test, since most graphics cards have more than 8GB of video memory, we basically set the normal high-video default mode in the operating parameters. Only the 4GB video memory version of RX 6500XT used the addition of --lowvram when running. low memory running mode (otherwise it won't work). Judging from the overall test results, AMD's graphics cards from the Radeon RX 5000 series to the RX 7000 series have almost all achieved the AI computing acceleration performance of Stable Diffusion applications. In particular, the performance of the Radeon RX 7000 series graphics cards is better than that of the RX 6000 series graphics cards. A huge improvement. For example, the image generation rate of RX 6900XT under this setting and model algorithm is about 8.87 pictures/second, while the RX 7900 XT can reach 15.76 pictures/second, a performance improvement of nearly 100%.
Compared with CPUs, all AMD graphics cards have very obvious advantages. The performance of RX 7900 XT is about 30 times that of Core i9-13900K. Even the entry-level graphics card RX 5500XT of the first two generations has a performance It is also almost 5 times that of Core i9-13900K.
The only exception is the RX 6500XT. According to the core specifications, it should be stronger than the RX 5500XT. However, since the video memory configuration is only 4GB, after turning on the low video memory operating mode in the test, the image generated The speed is greatly affected, far below the normal performance of GPU acceleration, and can only reach a level slightly higher than that of CPU computing.
In the next test, we used a series of relatively complex keywords to generate a villa located by the water, along with requirements for effects such as sunlight, ripples, and reflections. The keywords are as follows:
'Beautiful render of a Tudor style house near the water at sunset, fantasy forest. photorealistic, cinematic composition, cinematic high detail, ultra realistic, cinematic lighting, Depth of Field, hyper-detailed, beautifully color-coded, 8k, '
Resolution setting: 512×512
Sampling step: 50
Prompt word guidance coefficient: 7.5
Generate batches - quantity per batch: 1-1, 2-1, 4-1
The test results of this part are basically consistent with the previous test. The RX 7000 series graphics cards still dominate in terms of performance. Compared with the RX 6000 series graphics cards, the performance improvement is about 100%. Compared with the computing performance of the CPU, the performance improvement of GPU accelerated computing is still very significant. The performance of the RX 7900 XT reaches The performance of the entry-level graphics card RX 5500XT is almost 5 times that of the Core i9-13900K.
4GB video memory configuration RX 6500XT can only run in low video memory mode, so the image generation speed is still greatly affected, which is far lower than the normal performance of GPU acceleration and roughly equivalent to the Core i9-13900K.
This is a simple but interesting test. Through this experience, we think there are a few reference points that can be summarized for players to refer to:
1. Currently AMD graphics cards can realize Stable Diffusion AI computing acceleration under Windows systems through open source deployment solutions, and there are also many fool-proof integration packages on the Internet. Interested players can give it a try;
2. Judging from the test results, AMD graphics cards can achieve a performance increase far better than CPU calculations in Stable Diffusion's AI image generation calculations. Using GPU to accelerate calculations can bring twice the result with half the effort;
3. Judging from the test situation, when the rendering resolution is set to exceed 512 (such as 768×768), the video memory will be exceeded. This has a certain relationship with the deployment plan and model, but it also reflects When running in normal mode, 8GB of video memory is almost a hard entry requirement for Stable Diffusion. If the video memory is less than 8GB, even when rendering at 512×512 resolution, there will be insufficient video memory. At this time, you have to use the low-video memory running solution of --lowvram, but it will greatly slow down the calculation speed, as in the test RX 6500XT 4GB. So if you want to play Stable Diffusion smoothly, we recommend that the video memory of the graphics card is 8GB or higher;
4. Judging from the overall results, we believe that AMD GPU still has huge room for algorithm optimization. Based on unreliable experience, the performance gap from RX 7900 XTX to Core i9-13900K is not large enough. This has something to do with the algorithm solutions and models we deployed. We also hope that programmers in various communities can develop more and better computing acceleration solutions for AMD graphics cards.
No matter what, the hardware accelerated computing performance of AMD graphics cards for Stable Diffusion has been demonstrated, and the effect is relatively obvious. It is undoubtedly good news for AMD graphics card users. The rest is that players and AMD need to persist. It’s time to continue on the road to optimization.
The above is the detailed content of AMD Radeon graphics card Stable Diffusion AI drawing experience test. For more information, please follow other related articles on the PHP Chinese website!