Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers

Barbara Streisand

Nov 29, 2024 pm 05:36 PM

Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers

FLUX (by Black Forest Labs) has taken the world of AI image generation by storm in the last few months. Not only has it beat Stable Diffusion (the prior open-source king) on many benchmarks, it has also surpassed proprietary models like Dall-E or Midjourney in some metrics.

But how would you go about using FLUX on one of your apps? One might think of using serverless hosts like Replicate and others, but these can get very expensive very quickly, and may not provide the flexibility you need. That's where creating your own custom FLUX server comes in handy.

In this article, we'll walk you through creating your own FLUX server using Python. This server will allow you to generate images based on text prompts via a simple API. Whether you're running this server for personal use or deploying it as part of a production application, this guide will help you get started.

Prerequisites

Before diving into the code, let's ensure you have the necessary tools and libraries set up:

Python: You'll need Python 3 installed on your machine, preferably version 3.10.
torch: The deep learning framework we'll use to run FLUX.
diffusers: Provides access to the FLUX model.
transformers: Required dependency of diffusers.
sentencepiece: Required to run the FLUX tokenizer
protobuf: Required to run FLUX
accelerate: Helps load the FLUX model more efficiently in some cases.
fastapi: Framework to create a web server that can accept image generation requests.
uvicorn: Required to run the FastAPI server.
psutil: Allows us to check how much RAM there is on our machine.

You can install all the libraries by running the following command: pip install torch diffusers transformers sentencepiece protobuf accelerate fastapi uvicorn.

If you're using a Mac with an M1 or M2 chip, you should set up PyTorch with Metal for optimal performance. Follow the official PyTorch with Metal guide before proceeding.

You'll also need to make sure you have at least 12 GB of VRAM if you're planning on running FLUX on a GPU device. Or at least 12 GB of RAM for running on CPU/MPS (which will be slower).

Step 1: Setting Up the Environment

Let's start the script by picking the right device to run inference based on the hardware we're using.

device = 'cuda' # can also be 'cpu' or 'mps'
 
import os
 
# MPS support in PyTorch is not yet fully implemented
if device == 'mps':
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
 
import torch
 
if device == 'mps' and not torch.backends.mps.is_available():
      raise Exception("Device set to MPS, but MPS is not available")
elif device == 'cuda' and not torch.cuda.is_available():
      raise Exception("Device set to CUDA, but CUDA is not available")

Copy after login

You can specify cpu, cuda (for NVIDIA GPUs), or mps (for Apple's Metal Performance Shaders). The script then checks if the selected device is available and raises an exception if it's not.

Step 2: Loading the FLUX Model

Next, we load the FLUX model. We'll load the model in fp16 precision which will save us some memory without much loss in quality.

At this point, you may be asked to authenticate with HuggingFace, as the FLUX model is gated. In order to authenticate successfully, you'll need to create a HuggingFace account, go to the model page, accept the terms, and then create a HuggingFace token from your account settings and add it on your machine as the HF_TOKEN environment variable.

device = 'cuda' # can also be 'cpu' or 'mps'
 
import os
 
# MPS support in PyTorch is not yet fully implemented
if device == 'mps':
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
 
import torch
 
if device == 'mps' and not torch.backends.mps.is_available():
      raise Exception("Device set to MPS, but MPS is not available")
elif device == 'cuda' and not torch.cuda.is_available():
      raise Exception("Device set to CUDA, but CUDA is not available")

Copy after login

Here, we're loading the FLUX model using the diffusers library. The model we're using is black-forest-labs/FLUX.1-dev, loaded in fp16 precision.

There is also a timestep-distilled model named FLUX Schnell which has faster inference, but outputs less detailed images, as well as a FLUX Pro model which is closed-source.
We'll use the Euler scheduler here, but you may experiment with this. You can read more on schedulers here.
Since image generation can be resource-intensive, it's crucial to optimize memory usage, especially when running on a CPU or a device with limited memory.

from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline
import psutil
 
model_name = "black-forest-labs/FLUX.1-dev"
 
print(f"Loading {model_name} on {device}")
 
pipeline = FluxPipeline.from_pretrained(
      model_name,
 
      # Diffusion models are generally trained on fp32, but fp16
      # gets us 99% there in terms of quality, with just half the (V)RAM
      torch_dtype=torch.float16,
 
      # Ensure we don't load any dangerous binary code
      use_safetensors=True
 
      # We are using Euler here, but you can also use other samplers
      scheduler=FlowMatchEulerDiscreteScheduler()
).to(device)

Copy after login

This code checks the total available memory and enables attention slicing if the system has less than 64 GB of RAM. Attention slicing reduces memory usage during image generation, which is essential for devices with limited resources.

Step 3: Creating the API with FastAPI

Next, we'll set up the FastAPI server, which will provide an API to generate images.

# Recommended if running on MPS or CPU with < 64 GB of RAM
total_memory = psutil.virtual_memory().total
total_memory_gb = total_memory / (1024 ** 3)
if (device == 'cpu' or device == 'mps') and total_memory_gb < 64:
      print("Enabling attention slicing")
      pipeline.enable_attention_slicing()

Copy after login

FastAPI is a popular framework for building web APIs with Python. In this case, we're using it to create a server that can accept requests for image generation. We're also using GZip middleware to compress the response, which is particularly useful when sending images back in base64 format.

In a production environment, you might want to store the generated images in an S3 bucket or other cloud storage and return the URLs instead of the base64-encoded strings, to take advantage of a CDN and other optimizations.

Step 4: Defining the Request Model

We now need to define a model for the requests that our API will accept.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, conint, confloat
from fastapi.middleware.gzip import GZipMiddleware
from io import BytesIO
import base64
 
app = FastAPI()
 
# We will be returning the image as a base64 encoded string
# which we will want compressed
app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7)

Copy after login

This GenerateRequest model defines the parameters required to generate an image. The prompt field is the text description of the image you want to create. Other fields include the image dimensions, the number of inference steps, and the batch size.

Step 5: Creating the Image Generation Endpoint

Now, let's create the endpoint that will handle image generation requests.

device = 'cuda' # can also be 'cpu' or 'mps'
 
import os
 
# MPS support in PyTorch is not yet fully implemented
if device == 'mps':
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
 
import torch
 
if device == 'mps' and not torch.backends.mps.is_available():
      raise Exception("Device set to MPS, but MPS is not available")
elif device == 'cuda' and not torch.cuda.is_available():
      raise Exception("Device set to CUDA, but CUDA is not available")

Copy after login

This endpoint handles the image generation process. It first validates that the height and width are multiples of 8, as required by FLUX. It then generates images based on the provided prompt and returns them as base64-encoded strings.

Step 6: Starting the Server

Finally, let's add some code to start the server when the script is run.

from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline
import psutil
 
model_name = "black-forest-labs/FLUX.1-dev"
 
print(f"Loading {model_name} on {device}")
 
pipeline = FluxPipeline.from_pretrained(
      model_name,
 
      # Diffusion models are generally trained on fp32, but fp16
      # gets us 99% there in terms of quality, with just half the (V)RAM
      torch_dtype=torch.float16,
 
      # Ensure we don't load any dangerous binary code
      use_safetensors=True
 
      # We are using Euler here, but you can also use other samplers
      scheduler=FlowMatchEulerDiscreteScheduler()
).to(device)

Copy after login

This code starts the FastAPI server on port 8000, making it accessible not only from http://localhost:8000 but also from other devices on the same network using the host machine’s IP address, thanks to the 0.0.0.0 binding.

Step 7: Testing Your Server Locally

Now that your FLUX server is up and running, it's time to test it. You can use curl, a command-line tool for making HTTP requests, to interact with your server:

# Recommended if running on MPS or CPU with < 64 GB of RAM
total_memory = psutil.virtual_memory().total
total_memory_gb = total_memory / (1024 ** 3)
if (device == 'cpu' or device == 'mps') and total_memory_gb < 64:
      print("Enabling attention slicing")
      pipeline.enable_attention_slicing()

Copy after login

This command will only work on UNIX-based systems with the curl, jq and base64 utilities installed. It may also take up to a few minutes to complete depending on the hardware hosting the FLUX server.

Conclusion

Congratulations! You've successfully created your own FLUX server using Python. This setup allows you to generate images based on text prompts via a simple API. If you're not satisfied with the results of the base FLUX model, you might consider fine-tuning the model for even better performance on specific use cases.

Full code

You may find the full code used in this guide below:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, conint, confloat
from fastapi.middleware.gzip import GZipMiddleware
from io import BytesIO
import base64
 
app = FastAPI()
 
# We will be returning the image as a base64 encoded string
# which we will want compressed
app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7)

Copy after login

The above is the detailed content of Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks ago By DDD

Roblox: Dead Rails – How To Summon And Defeat Nikola Tesla

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7857

Java Tutorial

1649

CakePHP Tutorial

1403

Laravel Tutorial

1300

PHP Tutorial

1242

Related knowledge

How to solve the permissions problem encountered when viewing Python version in Linux terminal? Apr 01, 2025 pm 05:09 PM

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? Apr 01, 2025 pm 11:15 PM

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How does Uvicorn continuously listen for HTTP requests without serving_forever()? Apr 01, 2025 pm 10:51 PM

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

How to handle comma-separated list query parameters in FastAPI? Apr 02, 2025 am 06:51 AM

Fastapi ...

How to solve permission issues when using python --version command in Linux terminal? Apr 02, 2025 am 06:36 AM

Using python in Linux terminal...

How to teach computer novice programming basics in project and problem-driven methods within 10 hours? Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to get news data bypassing Investing.com's anti-crawler mechanism? Apr 02, 2025 am 07:03 AM

Understanding the anti-crawling strategy of Investing.com Many people often try to crawl news data from Investing.com (https://cn.investing.com/news/latest-news)...

See all articles