Google's new AI is hot! You can draw the longest word in the world
Friend, do you know what this English word is?
Pneumonoultramicroscopicsilicovolcanoconiosis.
This is the longest recognized word in the world - consisting of 45 letters, which means "a disease caused by the deposition of volcanic silica particles in the lungs" (commonly known as volcanic silicosis ).
But what if, instead of asking you to spell this word, you... draw it?
(You can’t even read, but you still draw???)
The latest AI proposed by Google - Parti, can easily hold this problem.
After "feeding" this word to Parti, it will be able to generate multiple reasonable pictures of lung diseases:
But this is just a small test of Parti’s capabilities. According to Google, it is currently the most advanced “text-to-image” AI.
For example, if you tell it: "Combine the Sydney Opera House with the Eiffel Tower", the output will be like this:
(I don’t know (I really thought it was a pictorial)
Moreover, in terms of algorithm, it is different from Google’s own Imagen. Parti can be said to have taken “AI painting” to a new level.
Even Jeff Dean, the head of Google AI, tweeted several times and had a great time:
Extensible to 20 billion parameters: more realistic, more "smart"
In fact, Parti's capabilities don't stop there.
Thanks to the model’s scalability to 20 billion parameters, on the one hand, the images it generates are more detailed and realistic.
Whether it is just a few words or a short paragraph of more than fifty words, it can be clearly displayed.
For example, The back of a violin, the back of the violin.
#Or maybe it is a night scene described according to Van Gogh's "Starry Night". ps, there are 67 words in this paragraph.
Parti is no problem, I have drawn all the pictures of various styles for you in one package~
This is also Parti’s second greatest ability. Not only are the details in place, but the style can also be varied.
There are also strange descriptions like "a raccoon wears a formal suit, a top hat, a cane, and a garbage bag", which can also create a flowery work without losing detail.
In terms of styles, there are Van Gogh style, Egyptian Pharaoh style, pixel style, traditional Chinese painting style, abstract style...
Even sometimes It also makes pun jokes.
(Toad'ay, toad)
Specifically in the test results, MS-COCO, Localized Narrative (LN, 4 times longer description) FID scores,Parti both achieve state-of-the-art results.
Especially the FID score of MS-COCO zero sample is only 7.23, and the fine-tuned FID score is 3.22, exceeding the previous Imagen and DALL-E 2.
All components are Transformers
After a month, Google has taken AI painting to a new level, but the author said: the secret is very simple.
Parti mainly treats text generated images as sequence-to-sequence modeling. This is somewhat similar to machine translation, where text tokens are given as input to the encoder, and the target output changes from text to an image.
Structurally, all its components have only three parts: encoder, decoder and image tagger, and they are all based on the standard Transformer.
First, the image is encoded into a discrete labeled sequence using the Transformer-based image tagger ViT-VQGAN.
Then the parameters are expanded to 20 billion through the encoding-decoding structure of Transformer.
Previous research on image generation from text, except for the earliest GAN, can be roughly divided into two ideas.
One is based on the autoregressive model. First, text features are mapped to image features, and then a sequence architecture similar to Transformer is used to learn the relationship between language input and image output.
A key component of this approach is the image tagger, which converts each image into a sequence of discrete units. For example, DALL-E and CogView adopt this idea.
The other is a route that has been making frequent progress during this period-text-to-image models based on diffusion, such as DALL-E 2 and Imagen.
They abandoned the image tagger and instead used a diffusion model to generate images directly. What can be seen is that these models produce higher quality images and score better on MS-COCO zero-shot FID.
#The success of the Parti model proves that the autoregressive model can be used to improve the effect of text-generated images.
At the same time, Parti also introduced and released a new benchmark test - PartiPrompts, which is used to measure the model's ability in 12 categories and 11 challenges.
But Parti still has certain limitations, and the researchers also showed some bugs:
For example, the negative description is useless~
A plate without bananas, and a glass without orange juice next to it.
Also makes some common sense mistakes, such as scaling unreasonably. For example, in this picture, the robot is several times taller than a racing car.
A shiny robot wearing a racing suit and black visor stands proudly in front of an F1 car. The sun sets over the cityscape. Comic book illustration.
Google “roll your own”
This study comes from Google Research, and most of the team members are Chinese.
The core research staff include Yuanzhong Xu, Thang Luong, etc., who are currently working at Google in AI-related research.
(Thang Luong has been cited up to 20,000 times on Google Scholar)
△Left: Yuanzhong Xu; Right: Thang Luong
But what’s interesting is that Imagen, which is both “say a word and let AI draw” and is produced by Google, is inextricably related to Parti.
It is mentioned in Parti’s GitHub project documentation:
Thanks to the Imagen team for sharing it with us before releasing Imagen Its most recent complete results.
Their important findings in CF-guidance were particularly helpful for the final Parti model.
And one of the authors of Imagen, Burcu Karagol Ayan, also participated in Parti’s project.
(It's like Google "roll it yourself")
Not only that, even Aditya Ramesh, the author of "Next Door" DALL-E 2, also rated Parti in MS-COCO Discussion work was done on this aspect.
and the authors of DALL-Eval also provided help with the Parti data work.
One More Thing
One thing to say is that "text-generated images" is not just a darling of researchers.
Netizens are endlessly enjoying "playing" with it (don't be too imaginative).
A while ago, I asked Imagen to draw a Song Dynasty "Tiger wearing VR", which directly evolved into an AI painting battle.
△Picture: Art by Imagen
DALL·E, MidJourney and others "came after hearing the news" to participate.
△ Drawing by DALL·E
There are even people who brought Wordle and DALL-E 2 together:
......
But returning to Parti this time is fun, but some netizens still raised questions that "cut straight to the soul":
When will it be commercialized? It would be pointless to "play behind closed doors" by yourself.
Parti paper address:
https://parti.research.google/
GitHub project address :
https://github.com/google-research/parti
Reference link:
[1]https:/ /twitter.com/lmthang/status/1539664610596225024[2]https://gizmodo.com/new-browser-game-combines-dall-e-mini-and-wordle-1849105289[3]https://imagen.research .google/
The above is the detailed content of Google's new AI is hot! You can draw the longest word in the world. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to output a countdown in C? Answer: Use loop statements. Steps: 1. Define the variable n and store the countdown number to output; 2. Use the while loop to continuously print n until n is less than 1; 3. In the loop body, print out the value of n; 4. At the end of the loop, subtract n by 1 to output the next smaller reciprocal.

How to achieve the playback of pictures like videos? Many times, we need to implement similar video player functions, but the playback content is a sequence of images. direct...

A solution to implement text annotation nesting in Quill Editor. When using Quill Editor for text annotation, we often need to use the Quill Editor to...

Data update problems in zustand asynchronous operations. When using the zustand state management library, you often encounter the problem of data updates that cause asynchronous operations to be untimely. �...

The return value type of the function is determined by the return type specified when the function is defined. Common types include int, float, char, and void (indicating that no value is returned). The return value type must be consistent with the actual returned value in the function body, otherwise it will cause compiler errors or unpredictable behavior. When returning a pointer, you must make sure that the pointer points to valid memory, otherwise it may cause a segfault. When dealing with return value types, error handling and resource release (such as dynamically allocated memory) need to be considered to write robust and reliable code.

Electron rendering process and WebView...

How to realize the function of playing pictures like videos? Many times, we need to achieve similar video playback effects in the application, but the playback content is not...

How to quickly build a front-end page in back-end development? As a backend developer with three or four years of experience, he has mastered the basic JavaScript, CSS and HTML...
