How to Use the Stable Diffusion 3 API-AI-php.cn

Stability AI announced an early preview of Stable Diffusion 3 in February 2024. The AI model is still in preview, but in April 2024, the team announced they would make Stable Diffusion 3 and Stable Diffusion 3 Turbo available on the Stability AI Developer Platform API after partnering with Fireworks AI, the fastest and most reliable API platform in the market.

Note that Stable Diffusion 3 is simply a series of text-to-image generative AI models. According to the team at Stability AI, the model is “equal to or outperforms” other text-to-image generators, such as OpenAI’s DALL-E 3 and Midjourney v6, in “typography and prompt adherence.”

In this tutorial, you will learn practical steps to get started with the API so you can start generating your own images.

Why Stable Diffusion 3?

Stable Diffusion 3 introduces several advancements and features that set it apart from its predecessors and make it highly competitive in the text-to-image generation space – particularly in terms of improved text generation and prompt-following capabilities.

Let's explore these advancements:

Enhanced prompt following

Contextual understanding: Stable Diffusion 3 incorporates state-of-the-art natural language processing (NLP) techniques, allowing it to understand better and interpret user prompts. This enables more accurate and contextually relevant responses to user inputs.
Prompt continuity: Unlike previous versions, Stable Diffusion 3 maintains better continuity in following prompts, ensuring that generated text remains coherent and aligned with the user's input throughout the conversation.

Improved text generation

Fine-tuned language models: Stable Diffusion 3 utilizes fine-tuned language models that have undergone extensive training on large datasets, resulting in enhanced text generation capabilities. These models better understand grammar, syntax, and semantics, leading to more coherent and natural-sounding responses.
Reduced response variability: Through improved training methodologies and model architectures, Stable Diffusion 3 reduces response variability, generating more consistent and high-quality outputs across different prompts and contexts.

Advanced prompt expansion

Multi-turn dialogue support: Stable Diffusion 3 can handle multi-turn dialogues more effectively, maintaining coherence and context across multiple exchanges between the user and the AI model.
Prompt expansion techniques: The model employs advanced prompt expansion techniques to generate more informative and contextually relevant responses, enriching the dialogue and providing users with comprehensive answers to their queries.

Fine-tuned control mechanisms

Parameter tuning: Stable Diffusion 3 features fine-tuned control mechanisms that allow users to adjust parameters such as response length, creativity level, and topic relevance, enabling a more customized interaction experience.
Bias mitigation: The model incorporates measures to mitigate biases in text generation, promoting fairness and inclusivity in its responses to user prompts.

Getting Started With Stable Diffusion 3 API

This section will go through the steps to start with the Stability API.

Step 1: Create your account. You'll need to create an account before you can use Stability AI’s API. You can sign up using a username and password, but new users get 25 free credits for signing up using their Google account.

Step 2: Claim your API key. Once you’ve created your account, you’ll need an API get. This can be found on the API Keys page. In the documentation, Stability AI states that “All APIs documented on this site use the same authentication mechanism: passing the API key in via the Authorization header.”

Step 3: Topping up credits. You must have credits to request the API. Credits are the unit of currency consumed when calling the API – the amount consumed varies across models and modalities. After using up all your credits, you can purchase more through your Billing dashboard at $1 USD per 100 credits.

In this tutorial, we will use Google Colab and ComfyUI to demonstrate how to generate images using the Stable Diffusion 3 API. In the next section, we will cover the steps to get started using each tool.

Using the Stable Diffusion 3 API with Google Colab

To get started with Google Colab, you must create a Google account – click the link and follow the instructions.

If you already have a Google account, open a new notebook and follow the steps below.

Note: The code used in this example is taken from the SD3_API tutorial by Stability AI.

Step 1: Install the requirements.

from io import BytesIO
import IPython
import json
import os
from PIL import Image
import requests
import time
from google.colab import output

Copy after login

Step 2: Connect to the Stability API.

import getpass
# To get your API key, visit https://platform.stability.ai/account/keys
STABILITY_KEY = getpass.getpass('Enter your API Key')

Copy after login

Step 3. Define functions

def send_generation_request(
    host,
    params,
):
    headers = {
        "Accept": "image/*",
        "Authorization": f"Bearer {STABILITY_KEY}"
    }

    # Encode parameters
    files = {}
    image = params.pop("image", None)
    mask = params.pop("mask", None)
    if image is not None and image != '':
        files["image"] = open(image, 'rb')
    if mask is not None and mask != '':
        files["mask"] = open(mask, 'rb')
    if len(files)==0:
        files["none"] = ''

    # Send request
    print(f"Sending REST request to {host}...")
    response = requests.post(
        host,
        headers=headers,
        files=files,
        data=params
    )
    if not response.ok:
        raise Exception(f"HTTP {response.status_code}: {response.text}")

    return response

Copy after login

Step 4. Generate images.

According to the documentation, the Stable Image services include only one offering that’s currently in production:

SD3: uses 6.5 credits
SD3 Turbo: uses 4 credits

Let’s test them out.

In this example, we will create an image of a Toucan bird in a lowland tropic area.

# SD3

prompt = "This dreamlike digital art captures a vibrant, Toucan bird in a lowland tropic area" #@param {type:"string"}
negative_prompt = "" #@param {type:"string"}
aspect_ratio = "1:1" #@param ["21:9", "16:9", "3:2", "5:4", "1:1", "4:5", "2:3", "9:16", "9:21"]
seed = 0 #@param {type:"integer"}
output_format = "jpeg" #@param ["jpeg", "png"]

host = f"https://api.stability.ai/v2beta/stable-image/generate/sd3"

params = {
    "prompt" : prompt,
    "negative_prompt" : negative_prompt,
    "aspect_ratio" : aspect_ratio,
    "seed" : seed,
    "output_format" : output_format,
    "model" : "sd3",
    "mode" : "text-to-image"
}

response = send_generation_request(
    host,
    params
)

# Decode response
output_image = response.content
finish_reason = response.headers.get("finish-reason")
seed = response.headers.get("seed")

# Check for NSFW classification
if finish_reason == 'CONTENT_FILTERED':
    raise Warning("Generation failed NSFW classifier")

# Save and display result
generated = f"generated_{seed}.{output_format}"
with open(generated, "wb") as f:
    f.write(output_image)
print(f"Saved image {generated}")

output.no_vertical_scroll()
print("Result image:")
IPython.display.display(Image.open(generated))

Copy after login

Here’s what it created:

How to Use the Stable Diffusion 3 API

Image created by author using Stable Diffusion 3

Now, let’s create an image of a car made out of fruits using SD3 Turbo:

#SD3 Turbo

prompt = "A car made out of fruits." #@param {type:"string"}
aspect_ratio = "1:1" #@param ["21:9", "16:9", "3:2", "5:4", "1:1", "4:5", "2:3", "9:16", "9:21"]
seed = 0 #@param {type:"integer"}
output_format = "jpeg" #@param ["jpeg", "png"]

host = f"https://api.stability.ai/v2beta/stable-image/generate/sd3"

params = {
    "prompt" : prompt,
    "aspect_ratio" : aspect_ratio,
    "seed" : seed,
    "output_format" : output_format,
    "model" : "sd3-turbo"
}

response = send_generation_request(
    host,
    params
)

# Decode response
output_image = response.content
finish_reason = response.headers.get("finish-reason")
seed = response.headers.get("seed")

# Check for NSFW classification
if finish_reason == 'CONTENT_FILTERED':
    raise Warning("Generation failed NSFW classifier")

# Save and display result
generated = f"generated_{seed}.{output_format}"
with open(generated, "wb") as f:
    f.write(output_image)
print(f"Saved image {generated}")

output.no_vertical_scroll()
print("Result image:")
IPython.display.display(Image.open(generated))

Copy after login

Running this code produced the following image:

Image created by author using Stable Diffusion 3 Turbo

Using the API with ComfyUI

ComfyUI is a robust and flexible graphical user interface (GUI) for stable diffusion. It features a graph-based interface and uses a flowchart-style design to enable users to create and run sophisticated, stable diffusion workflows.

System requirements:

Graphics Processing Unit (GPU): An adequate NVIDIA GPU with a minimum of 8GB of VRAM, such as the RTX 3060 Ti or better.
Central Processing Unit (CPU): A contemporary processor, including Intel Xeon E5, i5, Ryzen 5, or higher.
Random Access Memory (RAM): 16GB or greater.
Operating System: Windows 10/11 or Linux.
Adequate storage space on your computer for models and generated images.

Step 1: Install ComfyUI

The simplest method for installing ComfyUI on Windows involves utilizing the standalone installer found on the releases page. This installer includes essential dependencies such as PyTorch and Hugging Face Transformers, eliminating the need for separate installations.

It provides a comprehensive package, enabling a swift setup of ComfyUI on Windows without requiring intricate configurations.

Simply download, extract, add models, and launch!

Step 1.1: Download the standalone version of ComfyUI from this GitHub repository – clicking the link will initiate the download.

Step 1.2: Once you've downloaded the most recent comfyui-windows.zip file, extract it using a utility such as 7-Zip or WinRAR.

Step 1.3: A checkpoint model is required to start using ComfyUI. You can download a checkpoint model from Stable Diffusion or Hugging Face . Put the model in the folder:

from io import BytesIO
import IPython
import json
import os
from PIL import Image
import requests
import time
from google.colab import output

Copy after login

Step 1.4: Now, simply run the run_nvidia_gpu.bat (recommended) or run_cpu.bat. This should automatically start ComfyUI on your browser.

The command line will execute and generate a URL http://127.0.0.1:8188/ that you can now open in your browser.

Step 2: Install ComfyUI Manager

Within the File Explorer application, locate the directory you just installed. Given you’re using Windows, it should be named “ComfyUI_windows_portable.” From here, navigate to ComfyUI, and then custom_nodes. From this location, type cmd in the address bar and press Enter.

This should open up a command prompt terminal, where you must insert the following command:

import getpass
# To get your API key, visit https://platform.stability.ai/account/keys
STABILITY_KEY = getpass.getpass('Enter your API Key')

Copy after login

Once it’s complete, restart ComfyUI. The new “Manager” button should appear on the floating panel.

Step 3: Install the Stability AI API node

Select the Manage button and navigate to “Install Custom Nodes.” From here, search “stability API.”

Locate the "Stability API nodes for ComfyUI" node, then click the Install button situated on the right side to initiate the installation process. Following this, a “Restart” button will become visible. Click on “Restart” to reboot ComfyUI.

Step 4: Define the system-wide API key

This step is optional, but it’s recommended. Namely, You can set a Stability AI API key for each node within the Stability AI custom node. This prevents the need to input the API key repeatedly in every workflow and reduces the risk of inadvertently sharing your API key when sharing your workflow JSON file.

To do so, navigate to the custom node directory:

from io import BytesIO
import IPython
import json
import os
from PIL import Image
import requests
import time
from google.colab import output

Copy after login

Create a new file named sai_platform_key.txt. Paste your API Key into the file, save the document, and then restart ComfyUI.

Step 5: Load and run the workflow

Install the Stable Diffusion 3 text-to-image workflow and drop it into ComfyUI.

You’re now good to go!

Troubleshooting and Tips

As with any tool, there’s always a chance you’ll encounter a few issues along the way. Here are the most common challenges and troubleshooting steps for users facing issues with the API or the setup process.

API Key and authentication issues

Challenge: Users may face authentication errors when accessing the API due to an incorrect API key or wrong authentication credentials.

Troubleshooting: Double-check the API key and ensure it is copied and pasted correctly. Verify that there are no extra spaces or characters in the key. Ensure that the API key is properly authenticated by the Stable Diffusion 3 server.

Credit management problems

Challenge: Users may encounter issues related to credit management, such as insufficient credits or billing errors.

Troubleshooting: Check your credit balance in the Stable Diffusion 3 dashboard to ensure that you have sufficient credits. Verify your billing information and address any billing errors or discrepancies with the support team.

Connectivity and network problems

Challenge: Users may experience connectivity issues or network interruptions that prevent them from accessing the API.

Troubleshooting: Ensure that you have a stable internet connection and that there are no network disruptions. To isolate the issue, try accessing the API from a different network or device. Contact your internet service provider if you continue to experience connectivity problems.

Compatibility and dependency errors

Challenge: Users may encounter compatibility issues or dependency errors when installing or using the required tools and libraries.

Troubleshooting: Check the compatibility requirements of the Stable Diffusion 3 API and ensure that you are using compatible versions of tools and libraries. Update or reinstall any dependencies that are causing errors. Refer to the documentation and community forums for troubleshooting guidance.

Performance and response time

Challenge: Users may experience slow response times or performance issues when interacting with the API, particularly during peak usage times.

Troubleshooting: Monitor the API's performance and track response times to identify patterns or trends. Consider upgrading to a higher-tier subscription plan for better performance and priority access. Contact the support team if you consistently experience slow response times.

Documentation and support

Challenge: Users may encounter difficulties understanding the API documentation or require assistance troubleshooting specific issues.

Troubleshooting: For guidance on API usage, troubleshooting, and best practices, refer to the Stable Diffusion 3 documentation. If you have any unresolved issues or questions, contact the support team or community forums.

Conclusion

Stable Diffusion 3 is a series of text-to-image generative AI models. This article covered practical steps to start using the API with Google Colab and ComfyUI. Now, you have the skills to create your own images; be sure to apply what you learned as soon as possible so you do not forget.

Thanks for reading!

Further learning

Stable Diffusion Web UI: A Comprehensive User Guide for Beginners
Fine-tuning Stable Diffusion XL with DreamBooth and LoRA
How to Run Stable Diffusion
Generating Photorealistic Images using AI with Diffusers in Python

FAQs

What are some best practices for using Stable Diffusion 3 API effectively?

Best practices for using the Stable Diffusion 3 API include providing clear and specific prompts, experimenting with different parameters to achieve desired results, monitoring credit usage to avoid depletion, and staying updated with the latest documentation and features.

What is Stable Diffusion 3?

Stable Diffusion comprises a collection of AI models focused on generating images from textual prompts. Users provide descriptions of desired images, and the model generates corresponding visual representations based on these prompts.

How does Stable Diffusion work?

Stable Diffusion 3 employs a diffusion transformer architecture akin to Sora, diverging from prior versions that utilized a diffusion model akin to most existing image generation AIs. This innovation merges the transformer architecture commonly used in large language models such as GPT with diffusion models, offering the potential to leverage the strengths of both architectures.

The above is the detailed content of How to Use the Stable Diffusion 3 API. For more information, please follow other related articles on the PHP Chinese website!