Midjourney's rival is here! Google's StyleDrop ace 'Customization Master' detonates the AI art circle-AI-php.cn

Table of Contents

how to work?

Netizens’ hot comments

Home

Technology peripherals

Midjourney's rival is here! Google's StyleDrop ace 'Customization Master' detonates the AI art circle

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 05, 2023 pm 01:33 PM

Google Model

As soon as Google StyleDrop came out, it instantly hit the internet.

Given Van Gogh’s Starry Night, AI transformed into Master Van Gogh, and after a top-level understanding of this abstract style, it created countless similar paintings.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Another cartoon style, the objects I want to draw are much more cute.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Even, it can accurately control the details and design an original style logo.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

The charm of StyleDrop is that you only need one picture as a reference, and you can deconstruct and recreate the artistic style no matter how complex it is.

Netizens expressed that this is another AI tool that eliminates designers.

StyleDrop hot research is the latest product from the Google research team.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

##Paper address: https://arxiv.org/pdf/2306.00983.pdf

Now, with tools like StyleDrop, not only can you draw with more control, but you can also complete previously unimaginable fine work, such as drawing a logo.

Even NVIDIA scientists called it a "phenomenal" result.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

"Customization" Master

The author of the paper introduced that the source of inspiration for StyleDrop is Eyedropper (color absorption) /color picker tool).

Similarly, StyleDrop also hopes that everyone can quickly and effortlessly "pick" a style from a single/few reference images to generate an image of that style.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

##A sloth can have 18 styles:

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

A panda has 24 styles:

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

A watercolor painting drawn by a child, StyleDrop perfectly controls it, even the paper The folds have been restored.

I have to say, it’s too strong.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

There is also StyleDrop that refers to the design of English letters in different styles:

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈 ## are also the letters of Van Gogh style.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

There are also line drawings. Line drawing is a highly abstract image and requires very high rationality in the composition of the picture. Past methods have been difficult to succeed.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

The strokes of the cheese shadow in the original image are restored to the objects in each image.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Refer to Android LOGO creation.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

In addition, the researchers also expanded the capabilities of StyleDrop, not only to customize the style, combined with DreamBooth, but also to customize the content.

For example, still in the Van Gogh style, generate a similar style painting for the little Corgi:

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Here’s another one. The corgi below feels like the “Sphinx” on the Egyptian pyramids.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

how to work?

StyleDrop is built on Muse and consists of two key parts:

One is the effective fine-tuning of the parameters that generate the visual Transformer, and the other is iteration with feedback train.

The researchers then synthesized images from the two fine-tuned models.

Muse is the latest text-to-image synthesis model based on mask-generated image Transformer. It contains two synthesis modules for base image generation (256 × 256) and super-resolution (512 × 512 or 1024 × 1024).

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Each module consists of a text encoder T, a transformer G, a sampler S, and an image encoder It consists of decoder E and decoder D.

T maps the textual prompt t∈T to the continuous embedding space E. G processes text embeddings e ∈ E to generate logarithms of visual token sequences l ∈ L. S extracts a sequence of visual tokens v ∈ V from the logarithm through iterative decoding that runs several steps of transformer inference conditioned on the text embedding e and the visual token decoded from the previous step.

Finally, D maps the discrete token sequence to the pixel space I. In summary, given a text prompt t, the composition of image I is as follows:

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Figure 2 is a simplified Muse transformer layer architecture, which was partially modified to support Parameter Efficient Fine-tuning (PEFT) and adapters.

Use the transformer of the L layer to process the visual token sequence displayed in green under the condition of text embedding e. The learned parameters θ are used to construct weights for adapter tuning.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

In order to train θ, in many cases, researchers may only give pictures as style references.

Researchers need to manually attach text prompts. They proposed a simple, templated approach to constructing text prompts consisting of a description of the content followed by a description-style phrase.

For example, the researcher uses "cat" to describe an object in Table 1 and appends "watercolor painting" as a style description.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Including a description of content and style in a text prompt is critical because it helps separate content from style, which is the research The main goal of the personnel.

Figure 3 shows iterative training with feedback.

When training on a single style reference image (orange box), some images generated by StyleDrop may exhibit content extracted from the style reference image (red box, image The background contains a house similar to the style image).

Other images (blue boxes) better separate the style from the content. Iterative training of StyleDrop on good samples (blue box) results in a better balance between style and text fidelity (green box).

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

##The researchers also used two methods here:

-CLIP score

#This method is used to measure the alignment of images and text. Therefore, it can evaluate the quality of the generated images by measuring the CLIP score (i.e., the cosine similarity of visual and textual CLIP embeddings).

Researchers can select the CLIP image with the highest score. They call this method CLIP-feedback iterative training (CF).

In experiments, the researchers found that using CLIP scores to evaluate the quality of synthetic images is an effective way to improve recall (i.e., text fidelity) without excessive loss Style fidelity.

On the other hand, however, CLIP scores may not fully align with human intent, nor capture subtle stylistic attributes.

-HF

Human feedback (HF) is a method that injects user intent directly into synthetic image quality assessment in a more direct way.

In LLM fine-tuning for reinforcement learning, HF has proven its power and effectiveness.

HF can be used to compensate for the inability of CLIP scores to capture subtle style attributes.

Currently, a large amount of research has focused on the personalization problem of text-to-image diffusion models to synthesize images containing multiple personal styles.

Researchers show how DreamBooth and StyleDrop can be combined in a simple way to personalize both style and content.

This is accomplished by sampling from two modified generative distributions, guided by θs for style and θc for content, independently on the style and content reference images respectively. Trained adapter parameters.

Unlike existing products, the team’s approach does not require joint training of learnable parameters on multiple concepts, which leads to greater combinatorial capabilities. Because pre-trained adapters are trained on individual topics and styles separately.

The researchers’ overall sampling process follows the iterative decoding of Equation (1), with the logarithms sampled differently in each decoding step.

Suppose t is a text prompt, c is a text prompt without style descriptor, and the logarithm is calculated in step k as follows:

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Where: γ is used to balance StyleDrop and DreamBooth - if γ is 0, we get StyleDrop, if it is 1, we get DreamBooth.

By setting γ appropriately, we can get a suitable image.

Experimental settings

So far, there is no Style adjustment of text-image generative models has been extensively studied.

Therefore, the researchers proposed a new experimental plan:

-Data collection

The researchers collected dozens of pictures in different styles, from watercolor and oil paintings, flat illustrations, 3D renderings to sculptures of different materials.

-Model Configuration

Researchers use adapters to tune Muse-based StyleDrop. For all experiments, the Adam optimizer was used to update the adapter weights for 1000 steps with a learning rate of 0.00003. Unless otherwise stated, the researchers use StyleDrop to represent the second round of the model, which was trained on more than 10 synthetic images with human feedback.

- Evaluation

Quantitative evaluation of research reports based on CLIP, measuring style consistency and textual alignment. Additionally, the researchers conducted user preference studies to assess style consistency and text alignment.

As shown in the picture, the results of StyleDrop processing of 18 pictures of different styles collected by the researchers.

As you can see, StyleDrop is able to capture the nuances of texture, shading and structure of various styles, giving you better control over style than before.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

For comparison, the researchers also introduced the results of DreamBooth on Imagen, DreamBooth on Stable Diffusion and LoRA Realization and text inversion results.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

#The specific results are shown in the table, human image-text alignment (Text) and visual style alignment (Style) Evaluation metrics for score (top) and CLIP score (bottom).

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

## Qualitative comparison of (a) DreamBooth, (b) StyleDrop, and (c) DreamBooth StyleDrop:

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Here, the researchers applied the two metrics of the CLIP score mentioned above - text and style scores.

For text score, researchers measure the cosine similarity between image and text embeddings. For the style score, the researchers measure the cosine similarity between the style reference and the synthetic image embedding.

The researchers generated a total of 1520 images for 190 text prompts. While the researchers hoped the final score would be higher, the metrics are not perfect.

And iterative training (IT) improved text scores, which was in line with the researchers’ goals.

However, as a trade-off, their style scores on the first-round model are reduced because they are trained on synthetic images and the style may be biased by selection bias.

DreamBooth on Imagen is not as good as StyleDrop in style score (HF's 0.644 vs. 0.694).

The researchers noticed that the increase in style score of DreamBooth on Imagen was not obvious (0.569 → 0.644), while the increase of StyleDrop on Muse was more obvious (0.556 → 0.694).

Researchers analyzed that the style fine-tuning on Muse is more effective than that on Imagen.

In addition, for fine-grained control, StyleDrop captures subtle style differences, such as color offset, gradation, or sharp angle control.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Netizens’ hot comments

If designers have StyleDrop, their work efficiency will be 10 times faster and it has already taken off. .

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

One day of AI, 10 years of human life, AIGC is developing at the speed of light, the kind of light speed that blinds people's eyes!

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

Tools just follow the trend, and those that should be eliminated have already been eliminated long ago.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

This tool is much easier to use than Midjourney for making logos.

Midjourney劲敌来了！谷歌StyleDrop王牌「定制大师」引爆AI艺术圈

The above is the detailed content of Midjourney's rival is here! Google's StyleDrop ace 'Customization Master' detonates the AI art circle. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months ago By DDD

Atomfall guide: item locations, quest guides, and tips

1 months ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7699

Java Tutorial

1640

CakePHP Tutorial

1393

Laravel Tutorial

1287

PHP Tutorial

1230

Related knowledge

Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Feb 28, 2025 am 11:06 AM

This article introduces the registration process of the Sesame Open Exchange (Gate.io) web version and the Gate trading app in detail. Whether it is web registration or app registration, you need to visit the official website or app store to download the genuine app, then fill in the user name, password, email, mobile phone number and other information, and complete email or mobile phone verification.

Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Mar 04, 2025 pm 11:48 PM

A detailed introduction to the login operation of the Sesame Open Exchange web version, including login steps and password recovery process. It also provides solutions to common problems such as login failure, unable to open the page, and unable to receive verification codes to help you log in to the platform smoothly.

Why can't the Bybit exchange link be directly downloaded and installed? Feb 21, 2025 pm 10:57 PM

Why can’t the Bybit exchange link be directly downloaded and installed? Bybit is a cryptocurrency exchange that provides trading services to users. The exchange's mobile apps cannot be downloaded directly through AppStore or GooglePlay for the following reasons: 1. App Store policy restricts Apple and Google from having strict requirements on the types of applications allowed in the app store. Cryptocurrency exchange applications often do not meet these requirements because they involve financial services and require specific regulations and security standards. 2. Laws and regulations Compliance In many countries, activities related to cryptocurrency transactions are regulated or restricted. To comply with these regulations, Bybit Application can only be used through official websites or other authorized channels

Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Mar 18, 2025 pm 12:15 PM

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Feb 28, 2025 am 10:51 AM

It is crucial to choose a formal channel to download the app and ensure the safety of your account.

Binance binance official website latest version login portal Feb 21, 2025 pm 05:42 PM

To access the latest version of Binance website login portal, just follow these simple steps. Go to the official website and click the "Login" button in the upper right corner. Select your existing login method. If you are a new user, please "Register". Enter your registered mobile number or email and password and complete authentication (such as mobile verification code or Google Authenticator). After successful verification, you can access the latest version of Binance official website login portal.

Bitget trading platform official app download and installation address Feb 25, 2025 pm 02:42 PM

This guide provides detailed download and installation steps for the official Bitget Exchange app, suitable for Android and iOS systems. The guide integrates information from multiple authoritative sources, including the official website, the App Store, and Google Play, and emphasizes considerations during download and account management. Users can download the app from official channels, including app store, official website APK download and official website jump, and complete registration, identity verification and security settings. In addition, the guide covers frequently asked questions and considerations, such as