Home > Backend Development > Python Tutorial > Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers

Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers

Barbara Streisand
Release: 2024-11-29 17:36:10
Original
587 people have browsed it

Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers

FLUX (by Black Forest Labs) has taken the world of AI image generation by storm in the last few months. Not only has it beat Stable Diffusion (the prior open-source king) on many benchmarks, it has also surpassed proprietary models like Dall-E or Midjourney in some metrics.

But how would you go about using FLUX on one of your apps? One might think of using serverless hosts like Replicate and others, but these can get very expensive very quickly, and may not provide the flexibility you need. That's where creating your own custom FLUX server comes in handy.

In this article, we'll walk you through creating your own FLUX server using Python. This server will allow you to generate images based on text prompts via a simple API. Whether you're running this server for personal use or deploying it as part of a production application, this guide will help you get started.

Prerequisites

Before diving into the code, let's ensure you have the necessary tools and libraries set up:

  • Python: You'll need Python 3 installed on your machine, preferably version 3.10.
  • torch: The deep learning framework we'll use to run FLUX.
  • diffusers: Provides access to the FLUX model.
  • transformers: Required dependency of diffusers.
  • sentencepiece: Required to run the FLUX tokenizer
  • protobuf: Required to run FLUX
  • accelerate: Helps load the FLUX model more efficiently in some cases.
  • fastapi: Framework to create a web server that can accept image generation requests.
  • uvicorn: Required to run the FastAPI server.
  • psutil: Allows us to check how much RAM there is on our machine.

You can install all the libraries by running the following command: pip install torch diffusers transformers sentencepiece protobuf accelerate fastapi uvicorn.

If you're using a Mac with an M1 or M2 chip, you should set up PyTorch with Metal for optimal performance. Follow the official PyTorch with Metal guide before proceeding.

You'll also need to make sure you have at least 12 GB of VRAM if you're planning on running FLUX on a GPU device. Or at least 12 GB of RAM for running on CPU/MPS (which will be slower).

Step 1: Setting Up the Environment

Let's start the script by picking the right device to run inference based on the hardware we're using.

device = 'cuda' # can also be 'cpu' or 'mps'

import os

# MPS support in PyTorch is not yet fully implemented
if device == 'mps':
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

import torch

if device == 'mps' and not torch.backends.mps.is_available():
      raise Exception("Device set to MPS, but MPS is not available")
elif device == 'cuda' and not torch.cuda.is_available():
      raise Exception("Device set to CUDA, but CUDA is not available")
Copy after login
Copy after login
Copy after login

You can specify cpu, cuda (for NVIDIA GPUs), or mps (for Apple's Metal Performance Shaders). The script then checks if the selected device is available and raises an exception if it's not.

Step 2: Loading the FLUX Model

Next, we load the FLUX model. We'll load the model in fp16 precision which will save us some memory without much loss in quality.

At this point, you may be asked to authenticate with HuggingFace, as the FLUX model is gated. In order to authenticate successfully, you'll need to create a HuggingFace account, go to the model page, accept the terms, and then create a HuggingFace token from your account settings and add it on your machine as the HF_TOKEN environment variable.

device = 'cuda' # can also be 'cpu' or 'mps'

import os

# MPS support in PyTorch is not yet fully implemented
if device == 'mps':
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

import torch

if device == 'mps' and not torch.backends.mps.is_available():
      raise Exception("Device set to MPS, but MPS is not available")
elif device == 'cuda' and not torch.cuda.is_available():
      raise Exception("Device set to CUDA, but CUDA is not available")
Copy after login
Copy after login
Copy after login

Here, we're loading the FLUX model using the diffusers library. The model we're using is black-forest-labs/FLUX.1-dev, loaded in fp16 precision.

There is also a timestep-distilled model named FLUX Schnell which has faster inference, but outputs less detailed images, as well as a FLUX Pro model which is closed-source.
We'll use the Euler scheduler here, but you may experiment with this. You can read more on schedulers here.
Since image generation can be resource-intensive, it's crucial to optimize memory usage, especially when running on a CPU or a device with limited memory.

from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline
import psutil

model_name = "black-forest-labs/FLUX.1-dev"

print(f"Loading {model_name} on {device}")

pipeline = FluxPipeline.from_pretrained(
      model_name,

      # Diffusion models are generally trained on fp32, but fp16
      # gets us 99% there in terms of quality, with just half the (V)RAM
      torch_dtype=torch.float16,

      # Ensure we don't load any dangerous binary code
      use_safetensors=True

      # We are using Euler here, but you can also use other samplers
      scheduler=FlowMatchEulerDiscreteScheduler()
).to(device)
Copy after login
Copy after login

This code checks the total available memory and enables attention slicing if the system has less than 64 GB of RAM. Attention slicing reduces memory usage during image generation, which is essential for devices with limited resources.

Step 3: Creating the API with FastAPI

Next, we'll set up the FastAPI server, which will provide an API to generate images.

# Recommended if running on MPS or CPU with < 64 GB of RAM
total_memory = psutil.virtual_memory().total
total_memory_gb = total_memory / (1024 ** 3)
if (device == 'cpu' or device == 'mps') and total_memory_gb < 64:
      print("Enabling attention slicing")
      pipeline.enable_attention_slicing()
Copy after login
Copy after login

FastAPI is a popular framework for building web APIs with Python. In this case, we're using it to create a server that can accept requests for image generation. We're also using GZip middleware to compress the response, which is particularly useful when sending images back in base64 format.

In a production environment, you might want to store the generated images in an S3 bucket or other cloud storage and return the URLs instead of the base64-encoded strings, to take advantage of a CDN and other optimizations.

Step 4: Defining the Request Model

We now need to define a model for the requests that our API will accept.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, conint, confloat
from fastapi.middleware.gzip import GZipMiddleware
from io import BytesIO
import base64

app = FastAPI()

# We will be returning the image as a base64 encoded string
# which we will want compressed
app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7)
Copy after login
Copy after login

This GenerateRequest model defines the parameters required to generate an image. The prompt field is the text description of the image you want to create. Other fields include the image dimensions, the number of inference steps, and the batch size.

Step 5: Creating the Image Generation Endpoint

Now, let's create the endpoint that will handle image generation requests.

device = 'cuda' # can also be 'cpu' or 'mps'

import os

# MPS support in PyTorch is not yet fully implemented
if device == 'mps':
  os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

import torch

if device == 'mps' and not torch.backends.mps.is_available():
      raise Exception("Device set to MPS, but MPS is not available")
elif device == 'cuda' and not torch.cuda.is_available():
      raise Exception("Device set to CUDA, but CUDA is not available")
Copy after login
Copy after login
Copy after login

This endpoint handles the image generation process. It first validates that the height and width are multiples of 8, as required by FLUX. It then generates images based on the provided prompt and returns them as base64-encoded strings.

Step 6: Starting the Server

Finally, let's add some code to start the server when the script is run.

from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline
import psutil

model_name = "black-forest-labs/FLUX.1-dev"

print(f"Loading {model_name} on {device}")

pipeline = FluxPipeline.from_pretrained(
      model_name,

      # Diffusion models are generally trained on fp32, but fp16
      # gets us 99% there in terms of quality, with just half the (V)RAM
      torch_dtype=torch.float16,

      # Ensure we don't load any dangerous binary code
      use_safetensors=True

      # We are using Euler here, but you can also use other samplers
      scheduler=FlowMatchEulerDiscreteScheduler()
).to(device)
Copy after login
Copy after login

This code starts the FastAPI server on port 8000, making it accessible not only from http://localhost:8000 but also from other devices on the same network using the host machine’s IP address, thanks to the 0.0.0.0 binding.

Step 7: Testing Your Server Locally

Now that your FLUX server is up and running, it's time to test it. You can use curl, a command-line tool for making HTTP requests, to interact with your server:

# Recommended if running on MPS or CPU with < 64 GB of RAM
total_memory = psutil.virtual_memory().total
total_memory_gb = total_memory / (1024 ** 3)
if (device == 'cpu' or device == 'mps') and total_memory_gb < 64:
      print("Enabling attention slicing")
      pipeline.enable_attention_slicing()
Copy after login
Copy after login

This command will only work on UNIX-based systems with the curl, jq and base64 utilities installed. It may also take up to a few minutes to complete depending on the hardware hosting the FLUX server.

Conclusion

Congratulations! You've successfully created your own FLUX server using Python. This setup allows you to generate images based on text prompts via a simple API. If you're not satisfied with the results of the base FLUX model, you might consider fine-tuning the model for even better performance on specific use cases.

Full code

You may find the full code used in this guide below:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, conint, confloat
from fastapi.middleware.gzip import GZipMiddleware
from io import BytesIO
import base64

app = FastAPI()

# We will be returning the image as a base64 encoded string
# which we will want compressed
app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7)
Copy after login
Copy after login

The above is the detailed content of Creating an AI-powered Image Generation API Service with FLUX, Python, and Diffusers. For more information, please follow other related articles on the PHP Chinese website!

source:dev.to
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template