Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes-AI-php.cn

Table of Contents

Home

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 02, 2024 pm 05:18 PM

gpt-3.5 sota SDXL

There are many methods of high-quality image editing, but it is difficult to accurately express the real physical world.

So, try Edit the World.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes Picture

Peking University, Tiamat AI, Tiangong AI, and Mila Labs proposed EditWorld, which introduced a new editing task, namely World-instructed image editing. It defines and categorizes instructions based on various world scenarios.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes Picture

With the support of a set of pre-trained models, such as GPT-3.5, Video-LLava and SDXL, a world command is built multimodal data set.

A diffusion-based image editing model EditWorld was trained on this data set, and the result was that the performance on its new task was significantly better than the existing editing methods, achieving SOTA.

Image Editing New SOTA

Existing methods achieve high-quality image editing through a variety of ways, including but not limited to text control, dragging operations, and inpainting. Among them, the method of editing using instructions has received widespread attention due to its ease of use.

Although image editing methods are capable of producing high-quality results, they still have difficulties in handling world dynamics that convey true visual dynamics in the physical world.

As shown in Figure 1, neither InstructPix2pix nor MagicBrush can generate reasonable editing results.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes Picture

To solve this problem, the team introduced a new task called world-instructed image editing to enable image editing to reflect “World Dynamics” in the Real Physical World and Virtual Media.

Specifically, they defined and classified various world dynamic instructions and created a new multi-modal training dataset based on these instructions, which contains a large number of input-instruction-output triples Group.

Finally, the team trained a text-guided diffusion model using a carefully crafted dataset and proposed a zero-shot image manipulation strategy to achieve world-instructed image editing.

Based on task scenarios in the real world and virtual media, world-instructed image editing is divided into 7 categories, each category is defined and introduced, and a data sample is provided.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes Picture

The team then designed two branches: text-to-picture generation and video storyboard extraction to obtain the data set.

The text generation image branch is to enrich the richness of the data scene. Under this branch, the team first uses GPT to generate text quadruples (including input image description, instruction, output image description and keywords), and then Use the input and output descriptions to generate pictures corresponding to the text, and use the attention map corresponding to the keyword to locate the editing position and obtain the editing mask. At the same time, in order to ensure the consistency of the key features of the two pictures, the team introduced the method of image prompt adaption. IP-Adapter. Finally, the team used IP-Adapter and ControlNet, combined with the canny map of the output image and the image prompt feature of the input image, and used Image Inpainting to adjust the output image to obtain more effective editing data.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes Picture

After using the text generation picture branch to obtain scene-rich data, in order to add real data to the data set, the team extracted high-quality data from the video keyframes as editing data. Specifically, the team extracted two frames with strong correlation and large structural differences from the video storyboard as the starting and last frames, and cut out a new storyboard, and used a large multi-modal model to change the storyboard. After describing, the team finally used the start and end frames as the input image and output image, and used the obtained description as the instruction, thus obtaining the required editing data.

Going a step further, the team uses manual rechecking of the generated data to further improve data quality.

The team used the data set to finetune the InstructPix2Pix model. At the same time, in order to protect the non-editing area and achieve more precise editing, the team proposed a post-edit strategy.

Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes Picture

Finally it can be seen that the team’s approach can work well to achieve world- instructed image editing.

Paper link:
https://www.php.cn/link/154d7da9e669c75ee317d46614381dd8
Code link:
https://www.php .cn/link/e6da32eef072f987685b6eddca072d4f

The above is the detailed content of Generate dataset with GPT-3.5! New SOTA for image editing by Peking University Tiangong and other teams can accurately simulate physical world scenes. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Two Point Museum: All Exhibits And Where To Find Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7369

Java Tutorial

1628

CakePHP Tutorial

1355

Laravel Tutorial

1266

PHP Tutorial

1215

Related knowledge

Can online maps still be like this? MapTracker: Use tracking to realize the new SOTA of online maps! Apr 25, 2024 pm 05:01 PM

Written above & the author’s personal understanding is that this algorithm allows for online high-precision map construction. Our method, MapTracker, accumulates sensor streams into memory buffers for two displays: 1) Rasterlatents in Bird's Eye View (BEV) space and 2) Vectorlatents on road elements (i.e., crosswalks, lane lines, and road boundaries). The method draws on the query propagation paradigm in object tracking, which explicitly associates the tracked road elements of the previous frame with the current frame, while fusing a subset of memory latents with distance strides to open source link: https: //map-tracker.github.io/ In summary, the main contributions of this article are as follows: A new

CMU conducted a detailed comparative study and found that GPT-3.5 is superior to Gemini Pro, ensuring fair, transparent and reproducible performance Dec 21, 2023 am 08:13 AM

What is the strength of Google Gemini? Carnegie Mellon University conducted a professional and objective third-party comparison. To ensure fairness, all models use the same prompts and generation parameters, and provide reproducible code and fully transparent results. It will not use CoT@32 to compare 5-shot like Google’s official press conference. Results in one sentence: The GeminiPro version is close to but slightly inferior to GPT-3.5Turbo, and GPT-4 is still far ahead. During the in-depth analysis, we also found some strange characteristics of Gemini, such as choosing D for multiple-choice questions... Many researchers said that Gemini was tested in great detail just a few days after its release, which is a great achievement. In-depth testing of six major tasks This test is more specific than

MIT's latest masterpiece: using GPT-3.5 to solve the problem of time series anomaly detection Jun 08, 2024 pm 06:09 PM

Today I would like to introduce to you an article published by MIT last week, using GPT-3.5-turbo to solve the problem of time series anomaly detection, and initially verifying the effectiveness of LLM in time series anomaly detection. There is no finetune in the whole process, and GPT-3.5-turbo is used directly for anomaly detection. The core of this article is how to convert time series into input that can be recognized by GPT-3.5-turbo, and how to design prompts or pipelines to let LLM solve the anomaly detection task. Let me introduce this work to you in detail. Image paper title: Largelanguagemodelscanbezero-shotanomalydete

Read all SOTA generative models in one article: a complete review of 21 models in nine categories! May 02, 2023 pm 03:43 PM

In the past two years, there has been a surge in the release of large-scale generative models in the AI industry, especially after the open source of StableDiffusion and the open interface of ChatGPT, which has further stimulated the industry's enthusiasm for generative models. However, there are many types of generative models and they are released very quickly. If you are not careful, you may miss sota. Recently, researchers from the Pontifical University of Comillas, Spain, comprehensively reviewed the latest progress in AI in various fields and classified generative models into It is divided into nine categories according to task modes and fields, and 21 generative models released in 2022 are summarized. You can understand the development of generative models at once! Paper link: https://arxiv.org/abs/2301.04655生

OpenAI has fully opened GPT-3.5 Turbo, DALL-E and Whisper APIs Jul 15, 2023 am 10:57 AM

According to news on July 10, OpenAI yesterday announced the full opening of GPT-3.5Turbo, DALL-E and WhisperAPI to assist developers in improving model processing efficiency. In addition, OpenAI also stated that it is developing follow-up functions of GPT-4 and GPT-3.5Turbo. These features are scheduled to be rolled out in the second half of this year. OpenAI revealed that all current AI models called by APIs have been upgraded to GPT-4 by default, and existing users can use them without switching. Note: WhisperAPI is a speech-to-text AI model that can recognize the user's voice, video and other media and convert it into text. ▲Image source OpenAI official website In addition, OpenAI stated that it is continuing to improve Ch

Claude 3 overtakes GPT-4 Arena to reach the top! Xiaobei Haiku becomes the new favorite of developers: unrivaled cost-effectiveness Mar 28, 2024 pm 02:58 PM

GPT-4 has really been surpassed! In the large model arena, the new king of Claude3 Big Cup Opus has taken the throne, and the Elo score has reached the top of the list. Even Xiaobei Haiku has entered the second echelon, surpassing the GPT-4-0613 model and leaving GPT-3.5-turbo far behind. Haiku's input token price is half cheaper than GPT-3.5-turbo. In terms of output, it is nearly 2 yuan cheaper than GPT-3.5-turbo per 1 million tokens. Compared with GPT-4, the price is only 1/20. And Haiku also supports 200k context. No wonder some developers bluntly said: GPT-3.5 is unbearable in front of Claude Haiku

AI can prove 82% of the problems in mathematical databases. The new SOTA has been achieved, and it is still based on Transformer. Apr 10, 2023 am 08:51 AM

It has to be said that scientists have been obsessed with giving AI math lessons recently. No, the Facebook team also joined in the fun and proposed a new model that can completely automate the demonstration of theorems and is significantly better than SOTA. You know, as mathematical theorems become more complex, it will only become more difficult to prove the theorems by human power alone. Therefore, using computers to demonstrate mathematical theorems has become a research focus. OpenAI has previously proposed a model GPT-f that specializes in this direction, which can demonstrate 56% of the problems in Metamath. The latest method proposed this time can increase this number to 82.6%. At the same time, researchers say that this method takes less time and can reduce computational consumption to one-tenth of the original time compared to GPT-f. Disaster

Zhejiang University proposes new SOTA technology SIFU: only one picture can reconstruct high-quality 3D human body model Jan 18, 2024 pm 02:15 PM

In many fields such as AR, VR, 3D printing, scene building, and film production, high-quality 3D models of the human body wearing clothes are very important. Creating models using traditional methods requires a lot of time and requires specialized equipment and technicians to complete. Instead, in our daily lives, we usually use our phone cameras or portrait photos we find on the web. Therefore, a method that can accurately reconstruct a 3D human body model from a single image can significantly reduce costs and simplify the independent creation process. Comparison of the technical route of previous methods (left) and this method (right). Previous deep learning models used for 3D human body reconstruction often require three steps: extracting 2D features from images, transferring 2D features to 3D space, and 3D features. for human body reconstruction. However, these methods are limited in 2D special

See all articles