With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.-AI-php.cn

Table of Contents

Home

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 12, 2023 pm 03:46 PM

ai Model

Recently, text-to-image models have become a popular research direction. Whether it is a large natural landscape or a novel scene image, it may be automatically generated using simple text descriptions.

Among them, rendering wildly imagined scenes is a challenging task that requires compositing instances of specific themes (objects, animals, etc.) in new scenes so that they appear natural. Seamlessly blend into the scene.

Some large-scale text-to-image models achieve high-quality and diverse image synthesis based on text prompts written in natural language. The main advantage of these models is the strong semantic priors learned from a large number of image-text description pairs, such as associating the word "dog" with various instances of dogs that can appear in different poses in the image.

While the synthesis capabilities of these models are unprecedented, they lack the ability to imitate a given reference subject and synthesize new images with the same subject but different instances in different scenes. It can be seen that the expression ability of the output domain of existing models is limited.

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

In order to solve this problem, researchers from Google and Boston University proposed a "personalized" text-to-image diffusion model DreamBooth. Ability to adapt to user-specific image generation needs.

Paper address: https://arxiv.org/pdf/2208.12242.pdf

Project Address: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

The goal of this research is to extend the language of the model - the visual dictionary, so that it can incorporate new vocabulary Bind to the specific theme the user wants to generate. Once the new dictionary is embedded into the model, it can use these words to synthesize novel and realistic images of specific topics while contextualizing them in different scenes, preserving key identifying features, as shown in Figure 1 below.

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Specifically, the study implants images of a given subject into the model’s output domain so that they can be synthesized using a unique identifier . To this end, the study proposes a method to represent a given topic with a rare token identifier and fine-tunes a pre-trained, diffusion-based text-to-image framework that operates in two steps; generating low-resolution from text images, and then apply a super-resolution (SR) diffusion model.

This study first fine-tuned a low-resolution text-to-image model using input images and text hints containing unique identifiers (with subject class names, such as "A [V] dog") . To prevent the model from overfitting class names to specific instances and semantic drift, this study proposes a self-generated, class-specific prior preservation loss, which exploits the prior semantics of classes embedded in the model to encourage the model Generate different instances of the same class under a given topic.

In the second step, the study fine-tunes the super-resolution component using low-resolution and high-resolution versions of the input image. This allows the model to maintain high fidelity to small but important details in the subject of the scene.

Let’s take a look at the specific methods proposed in this study.

Method Introduction

Given 3-5 captured images without text descriptions, this paper aims to generate images with high detail fidelity and prompts by text New images to guide change. The study does not impose any restrictions on input images, and subject images can have different contexts. The method is shown in Figure 3. The output image can modify the original image, such as the position of the subject, change the properties of the subject such as color, shape, and modify the subject's posture, expression, material, and other semantic modifications.

More specifically, this method takes as input some images (usually 3 - 5 images) of a subject (for example, a specific dog) and the corresponding class name (for example, the dog category), and Returns a fine-tuned/personalized text-to-image model that encodes a unique identifier referencing the subject. Then, during reasoning, unique identifiers can be embedded in different sentences to synthesize topics in different contexts.

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

The first task of the research is to implant topic instances into the output domain of the model and bind the topics to unique identifiers. This study proposes methods for designing identifiers, in addition to designing a new method for supervising the model fine-tuning process.

In order to solve the problem of image overfitting and language drift, this study also proposes a loss (Prior-Preservation Loss), which encourages the diffusion model to continuously generate the same class as the subject. Different instances, thereby alleviating problems such as model overfitting and language drift.

In order to preserve image details, the study found that the super-resolution (SR) component of the model should be fine-tuned. This article is completed on the basis of the pre-trained Imagen model. The specific process is shown in Figure 4. Given 3-5 images of the same subject, the text-to-image diffusion model is then fine-tuned in two steps:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Rare token identifier represents the topic

This study marks all input images of the topic as "a [identifier] [class noun]", where [ identifier] is a unique identifier linked to the topic, while [class noun] is a rough class descriptor of the topic (e.g. cat, dog, watch, etc.). This study specifically uses class descriptors in sentences in order to associate class priors with topics.

Effect display

The following is a stable diffusion implementation of Dreambooth (refer to the project link). Qualitative results: The training images come from the "Textual Inversion" library:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

After the training is completed, at the prompt of "photo of a sks container", the model is generated The container photo is as follows:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Add a location "photo of a sks container on the beach" in the prompt, and the container will appear on the beach;

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

The green container is too simple in color. If you want to add some red, enter the prompt "photo of a red sks container" to get it done:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Enter the prompt "a dog on top of sks container" to make the puppy sit in the box:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

The following are some results presented in the paper. Generate artistic pictures about dogs in different artist styles:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

This research can also synthesize various expressions that do not appear in the input image, demonstrating the extrapolation ability of the model:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

For more details, please refer to the original paper.

The above is the detailed content of With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7599

CakePHP Tutorial

1386

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

123

Related knowledge

How to solve SQL parsing problem? Use greenlion/php-sql-parser! Apr 17, 2025 pm 09:15 PM

When developing a project that requires parsing SQL statements, I encountered a tricky problem: how to efficiently parse MySQL's SQL statements and extract the key information. After trying many methods, I found that the greenlion/php-sql-parser library can perfectly solve my needs.

How to solve the problem of PHP project code coverage reporting? Using php-coveralls is OK! Apr 17, 2025 pm 08:03 PM

When developing PHP projects, ensuring code coverage is an important part of ensuring code quality. However, when I was using TravisCI for continuous integration, I encountered a problem: the test coverage report was not uploaded to the Coveralls platform, resulting in the inability to monitor and improve code coverage. After some exploration, I found the tool php-coveralls, which not only solved my problem, but also greatly simplified the configuration process.

How to solve complex BelongsToThrough relationship problem in Laravel? Use Composer! Apr 17, 2025 pm 09:54 PM

In Laravel development, dealing with complex model relationships has always been a challenge, especially when it comes to multi-level BelongsToThrough relationships. Recently, I encountered this problem in a project dealing with a multi-level model relationship, where traditional HasManyThrough relationships fail to meet the needs, resulting in data queries becoming complex and inefficient. After some exploration, I found the library staudenmeir/belongs-to-through, which easily installed and solved my troubles through Composer.

How to solve the complexity of WordPress installation and update using Composer Apr 17, 2025 pm 10:54 PM

When managing WordPress websites, you often encounter complex operations such as installation, update, and multi-site conversion. These operations are not only time-consuming, but also prone to errors, causing the website to be paralyzed. Combining the WP-CLI core command with Composer can greatly simplify these tasks, improve efficiency and reliability. This article will introduce how to use Composer to solve these problems and improve the convenience of WordPress management.

How to solve the complex problem of PHP geodata processing? Use Composer and GeoPHP! Apr 17, 2025 pm 08:30 PM

When developing a Geographic Information System (GIS), I encountered a difficult problem: how to efficiently handle various geographic data formats such as WKT, WKB, GeoJSON, etc. in PHP. I've tried multiple methods, but none of them can effectively solve the conversion and operational issues between these formats. Finally, I found the GeoPHP library, which easily integrates through Composer, and it completely solved my troubles.

git software installation tutorial Apr 17, 2025 pm 12:06 PM

Git Software Installation Guide: Visit the official Git website to download the installer for Windows, MacOS, or Linux. Run the installer and follow the prompts. Configure Git: Set username, email, and select a text editor. For Windows users, configure the Git Bash environment.

How to solve the problem of virtual columns in Laravel model? Use stancl/virtualcolumn! Apr 17, 2025 pm 09:48 PM

During Laravel development, it is often necessary to add virtual columns to the model to handle complex data logic. However, adding virtual columns directly into the model can lead to complexity of database migration and maintenance. After I encountered this problem in my project, I successfully solved this problem by using the stancl/virtualcolumn library. This library not only simplifies the management of virtual columns, but also improves the maintainability and efficiency of the code.

The latest tutorial on how to read the key of git software Apr 17, 2025 pm 12:12 PM

This article will explain in detail how to view keys in Git software. It is crucial to master this because Git keys are secure credentials for authentication and secure transfer of code. The article will guide readers step by step how to display and manage their Git keys, including SSH and GPG keys, using different commands and options. By following the steps in this guide, users can easily ensure their Git repository is secure and collaboratively smoothly with others.

See all articles