Table of Contents
Paper details
Experimental evaluation
Home Technology peripherals AI rare! Apple's open-source image editing tool MGIE, is it going to be available on the iPhone?

rare! Apple's open-source image editing tool MGIE, is it going to be available on the iPhone?

Feb 05, 2024 pm 03:33 PM
iphone ai train

Take a photo, enter a text command, and the phone will start automatically retouching the photo?

This magical feature comes from Apple’s newly open-sourced image editing tool “MGIE”.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Remove people in the background

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

In Add pizza to the table

Recently, AI has made significant progress in image editing. On the one hand, through multi-modal large models (MLLM), AI can take images as input and provide visual perception responses, thereby achieving more natural picture editing. On the other hand, instruction-based editing technology makes the editing process no longer rely on detailed descriptions or area masks, but allows users to directly issue instructions to express editing methods and goals. This method is very practical because it is more in line with the intuitive way of humans. Through these innovative technologies, AI is gradually becoming people's right-hand assistant in the field of picture editing.

Based on the inspiration of the above technology, Apple proposed MGIE (MLLM-Guided Image Editing), using MLLM to solve the problem of insufficient instruction guidance.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

  • Paper title: Guiding Instruction-based Image Editing via Multimodal Large Language Models
  • Paper link: https://openreview.net/pdf?id=S1RKWSyZ2Y
  • Project homepage: https://mllm-ie.github.io/

MGIE (Mind-Guided Image Editing) consists of MLLM (Mind-Language Linking Model) and diffusion model, as shown in Figure 2. MLLM learns to acquire concise expression instructions and provides clear, visually relevant guidance. The diffusion model performs image editing using the latent imagination of the intended target and is updated synchronously through end-to-end training. In this way, MGIE is able to benefit from inherent visual derivation and resolve ambiguous human instructions to achieve sensible editing.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Guided by human commands, MGIE can perform Photoshop-style modifications, global photo optimization, and local object modifications. Take the picture below as an example. It is difficult to capture the meaning of "healthy" without additional context, but MGIE can accurately associate "vegetable toppings" with pizza and edit it accordingly to human expectations.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

##This reminds us of the "ambition" Cook expressed on the earnings call not long ago: "I I think there is a huge opportunity for Apple in generative AI, but I don’t want to go into more details.” The information he revealed included that Apple is actively developing generative AI software features, and these features will be available to Apple later in 2024. Customer provided.

Combined with a series of generative AI theoretical research results released by Apple in recent times, it seems that we are looking forward to the new AI functions that Apple will release next.

Paper details

The MGIE method proposed in this study can edit the input image V into the target image through the given instruction X rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? . For those imprecise instructions, MLLM in MGIE will perform learning derivation to obtain concise expression instructions ε. In order to build a bridge between language and visual modalities, the researchers also added a special token [IMG] after ε and used the edit head rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? to convert them. The transformed information will serve as the underlying visual imagination in MLLM, guiding the diffusion modelrare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? to achieve the desired editing goals. MGIE is then able to understand visually aware fuzzy commands to perform reasonable image editing (the architecture diagram is shown in Figure 2 above).

Concise expression of instructions

Through feature alignment and instruction adjustment, MLLM can provide cross-modal perception and vision Relevant responses. For image editing, the study uses the prompt "what will this image be like if [instruction]" as the language input for the image and derives detailed explanations of the editing commands. However, these explanations are often too lengthy and even mislead the user’s intent. To obtain a more concise description, this study applies a pretrained summarizer to let MLLM learn to generate summary output. This process can be summarized as follows:

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Image editing through potential imagination

The study uses editorial heads rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? to transform [IMG] into actual visual guidance. where rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? is a sequence-to-sequence model that maps continuous visual tokens from MLLM to semantically meaningful latent U = {u_1, u_2, ..., u_L} and serves as an editing guide :

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

In order to realize the process of guiding image editing through visual imagination, this study considers using the diffusion modelrare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone? , this model can also solve the denoising diffusion problem in the latent space while including a variational autoencoder (VAE).

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Algorithm 1 shows the MGIE learning process. MLLM derives compact instructions ε via instruction losses L_ins. Leveraging the underlying imagination of [IMG] rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?transforms its modalities and guides the rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?synthesis of the resulting image. The edit loss L_edit is used for diffusion training. Since most weights can be frozen (self-attention blocks within MLLM), parameter-efficient end-to-end training is achieved.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Experimental evaluation

For input images, under the same instructions, the difference between different methods Compare, for example, the first line of instructions is "turn day into night":

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Table 1 shows the zero-shot editing results of the model trained only on the dataset IPr2Pr. For EVR and GIER involving Photoshop-style modifications, the editing results were closer to the bootstrapping intent (e.g., LGIE achieved a higher CVS of 82.0 on EVR). For global image optimization on MA5k, InsPix2Pix is ​​intractable due to the scarcity of relevant training triples. LGIE and MGIE can provide detailed explanations through the learning of LLM, but LGIE is still limited to its single modality. By accessing the image, MGIE can derive explicit instructions such as which areas should be brightened or which objects should be clearer, resulting in significant performance improvements (e.g., higher 66.3 SSIM and lower 0.3 photo distance), in Similar results were found on MagicBrush. MGIE also obtains the best performance from precise visual imagery and modification of specified targets as targets (e.g., higher 82.2 DINO visual similarity and higher 30.4 CTS global subtitle alignment).

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

#To study instruction-based image editing for specific purposes, Table 2 fine-tunes the model on each dataset. For EVR and GIER, all models improved when adapted to Photoshop-style editing tasks. MGIE consistently outperforms LGIE in every aspect of editing. This also illustrates that learning using expressive instructions can effectively enhance image editing, and that visual perception plays a crucial role in obtaining explicit guidance for maximal enhancement.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Trade-off between α_X and α_V. Image editing has two goals: manipulating the target as an instruction and retaining the remainder of the input image. Figure 3 shows the trade-off curve between instruction (α_X) and input consistency (α_V). This study fixed α_X at 7.5 and α_V varied in the range [1.0, 2.2]. The larger α_V is, the more similar the editing result is to the input, but the less consistent it is with the instruction. The X-axis calculates the CLIP directional similarity, that is, how consistent the editing results are with the instructions; the Y-axis is the feature similarity between the CLIP visual encoder and the input image. With specific expression instructions, the experiments outperform InsPix2Pix in all settings. In addition, MGIE can learn through explicit visual guidance, allowing for overall improvement. This supports robust improvements whether requiring greater input or editing relevance.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Ablation research

Besides ,The researchers also conducted ablation experiments, ,considering the performance of different architectures FZ, FT, and ,E2E in expressing instructions. The results show that MGIE consistently exceeds LGIE in FZ, FT, and E2E. This suggests that expressive instructions with critical visual perception have a consistent advantage across all ablation settings.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Why is MLLM bootstrapping useful? Figure 5 shows the CLIP-Score values ​​between input or ground-truth target images and expression instructions. A higher CLIP-S score for the input image indicates that the instructions are relevant to the editing source, while better alignment with the target image provides clear, relevant editing guidance. As shown, MGIE is more consistent with the input/goal, which explains why its expressive instructions are helpful. With a clear narrative of expected results, MGIE can achieve the greatest improvements in image editing.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Human evaluation. In addition to automatic indicators, the researchers also performed manual evaluation. Figure 6 shows the quality of the generated expression instructions, and Figure 7 compares the image editing results of InsPix2Pix, LGIE, and MGIE in terms of instruction following, ground-truth relevance, and overall quality.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Inference efficiency. Although MGIE relies on MLLM to drive image editing, it only introduces concise expression instructions (less than 32 tokens), so the efficiency is comparable to InsPix2Pix. Table 4 lists the inference time costs on the NVIDIA A100 GPU. For a single input, MGIE can complete the editing task in 10 seconds. With more data parallelism, the time required is similar (37 seconds with a batch size of 8). The entire process can be completed with just one GPU (40GB).

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

Qualitative comparison. Figure 8 shows a visual comparison of all used datasets, and Figure 9 further compares the expression instructions of LGIE or MGIE.

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

rare! Apples open-source image editing tool MGIE, is it going to be available on the iPhone?

##On the project homepage, the researcher also provides more demos (https://mllm- ie.github.io/). For more research details, please refer to the original paper.

The above is the detailed content of rare! Apple's open-source image editing tool MGIE, is it going to be available on the iPhone?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to add, modify and delete MySQL data table field operation guide How to add, modify and delete MySQL data table field operation guide Apr 11, 2025 pm 05:42 PM

Field operation guide in MySQL: Add, modify, and delete fields. Add field: ALTER TABLE table_name ADD column_name data_type [NOT NULL] [DEFAULT default_value] [PRIMARY KEY] [AUTO_INCREMENT] Modify field: ALTER TABLE table_name MODIFY column_name data_type [NOT NULL] [DEFAULT default_value] [PRIMARY KEY]

What are the integrity constraints of oracle database tables? What are the integrity constraints of oracle database tables? Apr 11, 2025 pm 03:42 PM

The integrity constraints of Oracle databases can ensure data accuracy, including: NOT NULL: null values ​​are prohibited; UNIQUE: guarantee uniqueness, allowing a single NULL value; PRIMARY KEY: primary key constraint, strengthen UNIQUE, and prohibit NULL values; FOREIGN KEY: maintain relationships between tables, foreign keys refer to primary table primary keys; CHECK: limit column values ​​according to conditions.

Detailed explanation of nested query instances in MySQL database Detailed explanation of nested query instances in MySQL database Apr 11, 2025 pm 05:48 PM

Nested queries are a way to include another query in one query. They are mainly used to retrieve data that meets complex conditions, associate multiple tables, and calculate summary values ​​or statistical information. Examples include finding employees above average wages, finding orders for a specific category, and calculating the total order volume for each product. When writing nested queries, you need to follow: write subqueries, write their results to outer queries (referenced with alias or AS clauses), and optimize query performance (using indexes).

What does oracle do What does oracle do Apr 11, 2025 pm 06:06 PM

Oracle is the world's largest database management system (DBMS) software company. Its main products include the following functions: relational database management system (Oracle database) development tools (Oracle APEX, Oracle Visual Builder) middleware (Oracle WebLogic Server, Oracle SOA Suite) cloud service (Oracle Cloud Infrastructure) analysis and business intelligence (Oracle Analytics Cloud, Oracle Essbase) blockchain (Oracle Blockchain Pla

What are the system development tools for oracle databases? What are the system development tools for oracle databases? Apr 11, 2025 pm 03:45 PM

Oracle database development tools include not only SQL*Plus, but also the following tools: PL/SQL Developer: Paid tool, provides code editing, debugging, and database management functions, and supports syntax highlighting and automatic completion of PL/SQL code. Toad for Oracle: Paid tool that provides PL/SQL Developer-like features, and additional database performance monitoring and SQL optimization capabilities. SQL Developer: Oracle's official free tool, providing basic functions of code editing, debugging and database management, suitable for developers with limited budgets. DataGrip: JetBrains

How Tomcat logs help troubleshoot memory leaks How Tomcat logs help troubleshoot memory leaks Apr 12, 2025 pm 11:42 PM

Tomcat logs are the key to diagnosing memory leak problems. By analyzing Tomcat logs, you can gain insight into memory usage and garbage collection (GC) behavior, effectively locate and resolve memory leaks. Here is how to troubleshoot memory leaks using Tomcat logs: 1. GC log analysis First, enable detailed GC logging. Add the following JVM options to the Tomcat startup parameters: -XX: PrintGCDetails-XX: PrintGCDateStamps-Xloggc:gc.log These parameters will generate a detailed GC log (gc.log), including information such as GC type, recycling object size and time. Analysis gc.log

How to configure Debian Apache log format How to configure Debian Apache log format Apr 12, 2025 pm 11:30 PM

This article describes how to customize Apache's log format on Debian systems. The following steps will guide you through the configuration process: Step 1: Access the Apache configuration file The main Apache configuration file of the Debian system is usually located in /etc/apache2/apache2.conf or /etc/apache2/httpd.conf. Open the configuration file with root permissions using the following command: sudonano/etc/apache2/apache2.conf or sudonano/etc/apache2/httpd.conf Step 2: Define custom log formats to find or

How to implement file sorting by debian readdir How to implement file sorting by debian readdir Apr 13, 2025 am 09:06 AM

In Debian systems, the readdir function is used to read directory contents, but the order in which it returns is not predefined. To sort files in a directory, you need to read all files first, and then sort them using the qsort function. The following code demonstrates how to sort directory files using readdir and qsort in Debian system: #include#include#include#include#include//Custom comparison function, used for qsortintcompare(constvoid*a,constvoid*b){returnstrcmp(*(

See all articles