In early April, Meta released the first basic image segmentation model in history - SAM (Segment Anything Model) [1]. As a segmentation model, SAM has powerful capabilities and is very user-friendly. For example, if the user simply clicks to select the corresponding object, the object will be segmented immediately, and the segmentation result is very accurate. As of April 15, SAM's GitHub repository has a star count of 26k.
How to make good use of such a powerful "split everything" model and expand it to application scenarios with more practical needs is crucial. For example, what kind of sparks will emerge when SAM meets practical image inpainting (Image Inpainting) tasks?
The research team from the University of Science and Technology of China and the Eastern Institute of Technology gave a stunning answer. Based on SAM, they proposed the "Inpaint Anything" (IA) model. Different from the traditional image repair model, the IA model does not require detailed operations to generate masks and supports marking selected objects with one click. IA can remove everything and fill in all contents. Fill Anything) and Replace Anything, covering a variety of typical image repair application scenarios including target removal, target filling, background replacement, etc.
researchers tried for the first time mask-free image repair, and built a "Clicking and Filling" A new paradigm in image patching, which they call Inpaint Anything (IA). The core idea behind IA is to combine the advantages of different models to build a powerful and user-friendly image repair system. IA has three main functions: (i) Remove Anything: Users only need to click on the object they want to remove, and IA will remove it without leaving a trace Object to achieve efficient "magic elimination"; (ii) Fill Anything: At the same time, the user can further tell IA what they want to fill in the object through text prompt (Text Prompt), and IA will then drive the embedded AIGC (AI-Generated Content) model (such as Stable Diffusion [2]) generates corresponding content-filled objects to realize "content creation" at will; (iii) Replace Anything: Users can also click to select objects that need to be retained , and use text prompts to tell IA what you want to replace the background of the object with, then you can replace the background of the object with the specified content to achieve a vivid "environment transformation". The overall framework of IA is shown below:
##Inpaint Anything (IA) diagram. Users can select any object in the image by clicking on it. Leveraging powerful vision models such as SAM [1], LaMa [3], and Stable Diffusion (SD) [3], IA is able to smoothly remove selected objects (i.e., Remove Anything). Further, by inputting text prompts into IA, the user can fill the object with any desired content (i.e., Fill Anything) or arbitrarily replace the object's object (i.e., Replace Anything).
Remove everything
Remove Anything diagram "Remove Everything" steps are as follows: Fill everything
##Fill Anything diagram, the text prompt used in the picture: a teddy bear on a bench
"Fill Anything" steps As follows:
Replace Everything
## Replace Anything diagram, the text prompt used in the picture: a man in office
The steps to "fill everything" are as follows:
researcher’s model also supports 2K high-definition images and any aspect ratio, which enables the IA system to achieve efficient migration applications in various integration environments and existing frameworks .
Remove all experimental results
##Fill in all experimental resultsText prompt: a camera lens in the hand
Text prompt: an aircraft carrier on the sea
Text prompt: a sports car on a road
##Text prompt: a Picasso painting on the wall
##Replace all experimental results
Text prompt: sit on the swing
##Text prompt: breakfast
#Text prompt: a bus, on the center of a country road , summer
##Text prompt: crossroad in the city
SummaryThe researchers established such an interesting project to demonstrate the powerful capabilities that can be obtained by fully utilizing existing large-scale artificial intelligence models, and to reveal the unlimited potential of "composable artificial intelligence" (Composable AI). The Inpaint Anything (IA) proposed by the project is a multifunctional image repair system that integrates object removal, content filling, scene replacement and other functions (more functions are on the way, so stay tuned).
IA combines visual basic models such as SAM, image repair models (such as LaMa) and AIGC models (such as Stable Diffusion) to achieve user-friendly maskless image repair , and also supports "fool-style" user-friendly operations such as "click to delete and prompt to fill in". In addition, IA can process images with arbitrary aspect ratios and 2K HD resolution, regardless of the original content of the image.
Currently, the project has been completely open source
. Finally, everyone is welcome to share and promote Inpaint Anything (IA), and I look forward to seeing more new projects based on IA. In the future, researchers will further explore the potential of Inpaint Anything (IA) to support more practical new functions, such as fine-grained image cutout, editing, etc., and apply it to more real-life applications.
The above is the detailed content of When 'dividing everything' meets image repair: no need for precise marking, click on the object to achieve object removal, content filling, and scene replacement. For more information, please follow other related articles on the PHP Chinese website!