Home Technology peripherals AI 'AI Perspective Eye', three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

'AI Perspective Eye', three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

Mar 08, 2024 pm 03:46 PM
ai train

Occlusion is one of the most basic but still unsolved problems in computer vision, because occlusion means the lack of visual information, but the machine vision system relies on visual information for perception and understanding, and in reality In the world, mutual occlusion between objects is everywhere. The latest work of Andrew Zisserman's team at the VGG Laboratory at the University of Oxford systematically solved the problem of occlusion completion of arbitrary objects and proposed a new and more accurate evaluation data set for this problem. This work was praised by MPI boss Michael Black, the official account of CVPR, the official account of the Department of Computer Science of the University of Southern California, etc. on the X platform. The following is the main content of the paper "Amodal Ground Truth and Completion in the Wild".

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object


  • Paper link: https://arxiv.org/pdf/2312.17247.pdf
  • Project homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/
  • Code address: https://github.com/Championchess/Amodal-Completion-in-the-Wild

Amodal Segmentation is designed to complete objects that are occluded Part, that is, a shape mask that gives the visible and invisible parts of the object. This task can benefit many downstream tasks: object recognition, target detection, instance segmentation, image editing, 3D reconstruction, video object segmentation, support relationship reasoning between objects, robot manipulation and navigation, because in these tasks it is known that the occluded object is intact The shape will help.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

However, how to evaluate the performance of a model for non-modal segmentation in the real world is a difficult problem: although there are a large number of Occluded objects, but how to get the reference standard or non-modal mask of the complete shape of these objects? Previous work has involved manual annotation of non-modal masks, but the reference standards for such annotation are difficult to avoid introducing human errors; there are also works by creating synthetic data sets, such as directly attaching another object to a complete object. Obtain the complete shape of the occluded object, but the pictures obtained in this way are not real picture scenes. Therefore, this work proposes a method through 3D model projection to construct a large-scale real image dataset (MP3D-Amodal) covering multiple object categories and providing amodal masks to accurately evaluate the performance of amodal segmentation. The comparison of different data sets is as follows:

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

Specifically, taking the MatterPort3D data set as an example, for any real photos and scenes For a three-dimensional structured data set, we can simultaneously project the three-dimensional shapes of all objects in the scene onto the camera to obtain the modal mask of each object (visible shape, because objects are occluding each other), and then project each object in the scene The three-dimensional shape of the object is projected to the camera respectively to obtain the non-modal mask of the object, that is, the complete shape. By comparing the modal mask and the non-modal mask, occluded objects can be picked out.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

The statistics of the data set are as follows:

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

A sample of the data set is as follows:

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

#In addition, in order to solve the complete shape reconstruction task of any object, the author extracted Extract the prior knowledge about the complete shape of the object from the features of the Stable Diffusion model to perform non-modal segmentation of any occluded object. The specific architecture is as follows (SDAmodal):

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

The motivation for using Stable Diffusion Feature is that Stable Diffusion has the ability to complete pictures, so it may contain all the information about the object to a certain extent; and because Stable Diffusion After training with a large number of pictures, we can expect its features to have the ability to process any object in any environment. Different from previous two-stage frameworks, SDAmodal does not require marked occlusion masks as input; SDAmodal has a simple structure, but shows strong zero-sample generalization ability (compare Settings F and H in the following table, only in training on COCOA can improve on another data set in a different domain and different categories); even if there is no annotation of occluded objects, SDAmodal can improve on the existing data set COCOA covering multiple types of occluded objects and the newly proposed On the MP3D-Amodal data set, SOTA performance (Setting H) has been achieved.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

In addition to quantitative experiments, qualitative comparisons also reflect the advantages of the SDAmodal model: It can be observed from the figure below (all models are only in COCOA training), for different types of occluded objects, whether from COCOA or another MP3D-Amodal, SDAmodal can greatly improve the effect of non-modal segmentation, and the predicted non-modal mask is closer to reality of.

AI Perspective Eye, three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object

For more details, please read the original paper.

The above is the detailed content of 'AI Perspective Eye', three-time Marr Prize winner Andrew leads a team to solve the problem of occlusion and completion of any object. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Concept of c language function Concept of c language function Apr 03, 2025 pm 10:09 PM

C language functions are reusable code blocks. They receive input, perform operations, and return results, which modularly improves reusability and reduces complexity. The internal mechanism of the function includes parameter passing, function execution, and return values. The entire process involves optimization such as function inline. A good function is written following the principle of single responsibility, small number of parameters, naming specifications, and error handling. Pointers combined with functions can achieve more powerful functions, such as modifying external variable values. Function pointers pass functions as parameters or store addresses, and are used to implement dynamic calls to functions. Understanding function features and techniques is the key to writing efficient, maintainable, and easy to understand C programs.

What are the types of return values ​​of c language function? Summary of types of return values ​​of c language function? What are the types of return values ​​of c language function? Summary of types of return values ​​of c language function? Apr 03, 2025 pm 11:18 PM

The return value types of C language function include int, float, double, char, void and pointer types. int is used to return integers, float and double are used to return floats, and char returns characters. void means that the function does not return any value. The pointer type returns the memory address, be careful to avoid memory leakage.结构体或联合体可返回多个相关数据。

How to calculate c-subscript 3 subscript 5 c-subscript 3 subscript 5 algorithm tutorial How to calculate c-subscript 3 subscript 5 c-subscript 3 subscript 5 algorithm tutorial Apr 03, 2025 pm 10:33 PM

The calculation of C35 is essentially combinatorial mathematics, representing the number of combinations selected from 3 of 5 elements. The calculation formula is C53 = 5! / (3! * 2!), which can be directly calculated by loops to improve efficiency and avoid overflow. In addition, understanding the nature of combinations and mastering efficient calculation methods is crucial to solving many problems in the fields of probability statistics, cryptography, algorithm design, etc.

distinct function usage distance function c usage tutorial distinct function usage distance function c usage tutorial Apr 03, 2025 pm 10:27 PM

std::unique removes adjacent duplicate elements in the container and moves them to the end, returning an iterator pointing to the first duplicate element. std::distance calculates the distance between two iterators, that is, the number of elements they point to. These two functions are useful for optimizing code and improving efficiency, but there are also some pitfalls to be paid attention to, such as: std::unique only deals with adjacent duplicate elements. std::distance is less efficient when dealing with non-random access iterators. By mastering these features and best practices, you can fully utilize the power of these two functions.

What are c language function pointers and pointer functions? What's the difference? What are c language function pointers and pointer functions? What's the difference? Apr 03, 2025 pm 11:54 PM

A function pointer is a pointer to a function, and a pointer function is a function that returns a pointer. Function pointers point to functions, used to select and execute different functions; pointer functions return pointers to variables, arrays or other functions; when using function pointers, pay attention to parameter matching and checking pointer null values; when using pointer functions, pay attention to memory management and free dynamically allocated memory; understand the differences and characteristics of the two to avoid confusion and errors.

What are the differences and connections between c and c#? What are the differences and connections between c and c#? Apr 03, 2025 pm 10:36 PM

Although C and C# have similarities, they are completely different: C is a process-oriented, manual memory management, and platform-dependent language used for system programming; C# is an object-oriented, garbage collection, and platform-independent language used for desktop, web application and game development.

What are the pointer parameters in the parentheses of the C language function? What are the pointer parameters in the parentheses of the C language function? Apr 03, 2025 pm 11:48 PM

The pointer parameters of C language function directly operate the memory area passed by the caller, including pointers to integers, strings, or structures. When using pointer parameters, you need to be careful to modify the memory pointed to by the pointer to avoid errors or memory problems. For double pointers to strings, modifying the pointer itself will lead to pointing to new strings, and memory management needs to be paid attention to. When handling pointer parameters to structures or arrays, you need to carefully check the pointer type and boundaries to avoid out-of-bounds access.

What are the formats of function definition in C language? What are the formats of function definition in C language? Apr 03, 2025 pm 11:51 PM

The key elements of C function definition include: return type (defining the value returned by the function), function name (following the naming specification and determining the scope), parameter list (defining the parameter type, quantity and order accepted by the function) and function body (implementing the logic of the function). It is crucial to clarify the meaning and subtle relationship of these elements, and can help developers avoid "pits" and write more efficient and elegant code.

See all articles