Home > Technology peripherals > AI > 'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'

'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'

WBOY
Release: 2023-09-12 16:05:02
forward
1041 people have browsed it

In common image editing operations, image synthesis refers to the process of combining the foreground object of one picture with another background picture to generate a composite picture. The visual effect of the synthesized image is similar to transferring foreground objects from one picture to another background picture, as shown in the figure below

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

Image synthesis in artistic creation , poster design, e-commerce, virtual reality, data augmentation and other fields are widely used

The composite image obtained by simple cut and paste may have many problems. In previous research work, image synthesis derived different subtasks to solve different subproblems respectively. Image blending, for example, aims to resolve unnatural borders between foreground and background. Image harmonization aims to adjust the lighting of the foreground so that it harmonizes with the background. Perspective adjustment aims to adjust the pose of the foreground so that it matches the background. Object placement aims to predict the appropriate location, size, and perspective angle for foreground objects. Shadow generation aims to generate reasonable shadows for foreground objects on the background

As shown in the figure below, previous research work performed the above subtasks in a serial or parallel manner to obtain realistic and natural synthetic images. In the serial framework, we can selectively execute some subtasks according to actual needs

In the parallel framework, the currently popular method is to use the diffusion model. It accepts a background image with a foreground bounding box and a foreground object image as input and directly generates the final composite image. This can make foreground objects and background images seamlessly blended, lighting and shadow effects are reasonable, and postures are adapted to the background.

This parallel framework is equivalent to executing multiple subtasks at the same time, and cannot selectively execute some subtasks. It is not controllable and may bring unnecessary or unreasonable changes to the posture or color of foreground objects

What needs to be rewritten is:

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

In order to enhance the controllability of the parallel framework and selectively perform some sub-tasks, we proposed the controllable image composition model Controlable Image Composition (ControlCom). As shown in the figure below, we use an indicator vector as the condition information of the diffusion model to control the properties of the foreground objects in the composite image. The indication vector is a two-dimensional binary vector, in which each dimension controls whether to adjust the lighting attributes and posture attributes of the foreground object respectively, where 1 means adjustment and 0 means retention. Specifically, (0,0 ) means that it neither changes the foreground illumination nor the foreground posture, but just seamlessly blends the object into the background image, which is equivalent to image blending. (1,0) means only changing the foreground lighting to make it harmonious with the background and retaining the foreground posture, which is equivalent to image harmonization. (0,1) means only changing the foreground pose to match the background and retaining the foreground illumination, which is equivalent to view synthesis. (1,1) means changing the illumination and posture of the foreground at the same time, which is equivalent to the current uncontrollable parallel image synthesis

We incorporate four tasks into the same framework and implement a four-in-one object portal through indicator vectors function that can transport objects to specified locations in the scene. This work is a collaboration between Shanghai Jiao Tong University and Ant Group. The code and model will be open source soon

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant ProducedPlease click the following link to view the paper: https://arxiv.org/ abs/2308.10040

Code model link: https://github.com/bcmi/ControlCom-Image-Composition

In the figure below, we show the function of controllable image composition

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant ProducedIn the left column, the posture of the foreground object is originally adapted to the background image. The user may want to retain the posture of the foreground object. Previous methods PbE [1] and ObjectStitch [2] will make unnecessary and uncontrollable changes to the pose of foreground objects. The (1,0) version of our method is able to preserve the pose of the foreground object, blending the foreground object seamlessly into the background image with harmonious lighting

In the column on the right, the lighting of the foreground object is supposed to be the same as the background lighting. Previous methods may cause unexpected changes in the color of foreground objects, such as vehicles and clothing. Our method (version 0.1) is able to preserve the color of a foreground object while simultaneously adjusting its pose so that it blends naturally into the background image

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

Next, we show more results for four versions of our method (0,0), (1,0), (0,1), (1,1). It can be seen that when using different indicator vectors, our method can selectively adjust some attributes of foreground objects, effectively control the effect of the composite image, and meet the different needs of users.

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

What we need to rewrite is: What is the model structure that can realize the four functions? Our method adopts the following model structure. The input of the model includes background images with foreground bounding boxes and foreground object images. The features and indicator vectors of the foreground objects are combined into the diffusion model

We re-extract the foreground Global features and local features of the object, and first fuse global features and then local features. During the local fusion process, we use aligned foreground feature maps for feature modulation to achieve better detail preservation. At the same time, indicator vectors are used in both global fusion and local fusion to more fully control the properties of foreground objects

We use the pre-trained stable diffusion algorithm to train the model based on 1.9 million images from OpenImage. In order to train four subtasks simultaneously, we designed a set of data processing and enhancement processes. For details on the data and training, see the paper

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

We tested on the COCOEE dataset and a dataset we built ourselves. Since previous methods can only achieve uncontrollable image synthesis, we compared with the (1,1) version and previous methods. The comparison results are shown in the figure below. PCTNet is an image harmonization method that can preserve the details of objects, but cannot adjust the posture of the foreground, nor can it complete the foreground objects. Other methods can generate the same kind of objects, but are less effective at retaining details, such as the style of clothes, the texture of cups, the color of bird feathers, etc.

Our method is better in comparison. Preserve the details of foreground objects, complete incomplete foreground objects, and adjust the lighting, posture and adaptation of foreground objects to the background

Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced

This work is for controllable This is the first attempt at image synthesis. The task is very difficult and there are still many shortcomings. The performance of the model is not stable and robust enough. In addition, in addition to lighting and pose, the attributes of foreground objects can be further refined. How to achieve finer-grained controllable image synthesis is a more challenging task

In order to keep the original intention Changes, the content that needs to be rewritten is: Reference

Yang, Gu, Zhang, Zhang, Chen, Sun, Chen, Wen (2023). Example-based image editing and diffusion models. In CVPR

[2] Song Yongzhong, Zhang Zhi, Lin Zhilong, Cohen, S. D., Price, B. L., Zhang Jing, Jin Suying, Arriaga, D. G. 2023. ObjectStitch: Generative object synthesis. In CVPR

The above is the detailed content of 'Scene Control Portal: Four-in-one Object Teleportation, Submitted & Ant Produced'. For more information, please follow other related articles on the PHP Chinese website!

source:jiqizhixin.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template