Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV-AI-php.cn

Table of Contents

General segmentation method" >General segmentation method

How it works

Multiple input prompts" >Multiple input prompts

SA-1B dataset: 11 million images, 1.1 billion masks

RBG master leads the team

Home

Technology peripherals

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

王林

Apr 07, 2023 pm 03:00 PM

ai Cutout

Just now, Meta AI released Segment Anything Model (SAM) - the first basic model for image segmentation.

SAM can achieve one-click segmentation of any object from photos or videos, and can migrate to other tasks with zero samples.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Overall, SAM follows the idea of the basic model:

1. A very Simple yet scalable architecture that can handle multi-modal cues: text, keypoints, bounding boxes.

2. Intuitive annotation process, closely connected with model design.

3. A data flywheel that allows the model to be bootstrapped to a large number of unlabeled images.

And, it is no exaggeration to say that SAM has learned the general concept of "object", even for unknown objects, unfamiliar scenes (such as underwater and under microscopes), and blurry The same is true for the case.

In addition, SAM can also be generalized to new tasks and new fields, and practitioners no longer need to fine-tune the model themselves.

Paper address: https://ai.facebook.com/research/publications/segment-anything/

The most powerful thing is that Meta implements a completely different CV paradigm. You can specify a point, a bounding box, and a sentence in a unified framework prompt encoder to directly segment objects with one click.

In this regard, Tencent AI algorithm expert Jin Tian said, "The prompt paradigm in the NLP field has begun to extend to the CV field. This time, it may completely change the traditional prediction thinking of CV. . Now you can really use a model to segment any object, and it is dynamic!"

NVIDIA AI scientist Jim Fan even praised this: We are already here It’s the “GPT-3 moment” in the field of computer vision!

So, CV really doesn’t exist anymore?

SAM: "Cut out" all objects in any image with one click

Segment Anything is the first basic model dedicated to image segmentation.

Segmentation refers to identifying which image pixels belong to an object and has always been the core task of computer vision.

However, if you want to create an accurate segmentation model for a specific task, it usually requires highly specialized work by experts. This process requires an infrastructure for training AI and a large number of carefully annotated domains. Data, so the threshold is extremely high.

In order to solve this problem, Meta proposed a basic model for image segmentation-SAM. This hintable model, trained on diverse data, is not only adaptable to a variety of tasks, but also operates similarly to how hints are used in NLP models.

The SAM model grasps the concept of "what is an object" and can generate a mask for any object in any image or video, even objects it has not seen during training.

SAM is so versatile that it covers a variety of use cases and can be used in new imaging domains out of the box without additional training, whether it's underwater photos, Or a cell microscope. In other words, SAM already has the capability of zero-sample migration.

Meta said excitedly in the blog: It can be expected that in the future, SAM will be used in any application that needs to find and segment objects in images.

SAM can become part of a larger AI system to develop a more general multi-modal understanding of the world, for example, understanding the visual and textual content of web pages.

In the field of AR/VR, SAM can select objects based on the user’s line of sight and then “upgrade” the objects to 3D.

For content creators, SAM can extract image areas for collage, or video editing.

SAM can also locate and track animals or objects in videos, which is helpful for natural science and astronomy research.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

General segmentation method

In the past, there were two methods to solve the segmentation problem.

One is interactive segmentation, which can segment objects of any category, but requires a person to fine-tune the mask through iteration.

The second is automatic segmentation, which can segment specific objects defined in advance, but the training process requires a large number of manually labeled objects (for example, to segment a cat, thousands of example).

In short, neither of these two methods can provide a universal, fully automatic segmentation method.

And SAM can be seen as a generalization of these two methods, and it can easily perform interactive segmentation and automatic segmentation.

On the model's promptable interface, a wide range of segmentation tasks can be completed by simply designing the correct prompts (clicks, boxes, text, etc.) for the model.

Additionally, SAM is trained on a diverse, high-quality dataset containing over 1 billion masks, allowing the model to generalize to new objects and images beyond its capabilities. What was observed during training. As a result, practitioners no longer need to collect their own segmentation data to fine-tune models for use cases.

This kind of flexibility that can be generalized to new tasks and new fields is the first time in the field of image segmentation.

(1) SAM allows users to segment objects with one click, or interactively click many points, and can also use bounding box hints for the model.

(2) When faced with the ambiguity of segmented objects, SAM can output multiple valid masks, which is an essential capability for solving segmentation problems in the real world.

(3) SAM can automatically discover and block all objects in the image. (4) After precomputing image embeddings, SAM can generate segmentation masks for any prompt in real time, allowing users to interact with the model in real time.

How it works

The SAM trained by the researchers can return valid segmentation masks for any prompt. Cues can be foreground/background points, rough boxes or masks, free-form text, or generally any information that indicates that segmentation is needed in the image.

The requirement for effective masking simply means that even in cases where the prompt is ambiguous and may refer to multiple objects (e.g., a dot on a shirt may represent either the shirt or the person wearing the shirt ) , the output should be a reasonable mask of one of the objects.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

The researchers observed that pre-training tasks and interactive data collection impose specific constraints on model design. constraint.

In particular, the model needs to run in real time on the CPU in a web browser so that standard staff can efficiently interact with SAM in real time for annotation.

While runtime constraints mean there is a trade-off between quality and runtime, the researchers found that in practice, simple designs can achieve good results.

SAM's image encoder produces one-time embeddings for images, while the lightweight decoder converts any hints into vector embeddings on the fly. These two sources of information are then combined in a lightweight decoder that predicts segmentation masks.

After calculating the image embedding, SAM can generate a segment of the image in just 50 milliseconds and give any prompt in the web browser.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

The latest SAM model was trained on 256 A100 images for 68 hours (nearly 5 days).

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Project demonstration

Multiple input prompts

Prompts for specifying the content to be divided in the image, Various segmentation tasks can be implemented without additional training.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

##Use interaction points and boxes as prompts

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Automatically segment all elements in the image

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Generate multiple valid masks for ambiguous prompts

Promptable design

SAM can accept input prompts from other systems.

For example, select the corresponding object based on the user's visual focus information transmitted from the AR/VR headset. Meta's development of AI that can understand the real world will pave the way for its future metaverse journey.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Alternatively, implement text-to-object segmentation using bounding box hints from the object detector.

Scalable output

The output mask can be used as input to other AI systems.

For example, the mask of an object can be tracked in a video, turned into 3D through imaging editing applications, or used for creative tasks such as collage.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Zero-sample generalization

SAM learned A general idea of what an object is - this understanding enables zero-shot generalization to unfamiliar objects and images without the need for additional training.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Various reviews

Select Hover&Click, click Add Mask and a green dot will appear, click Remove Area and a red dot will appear , the apple-eating Huahua was immediately circled.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

In the Box function, simply select the box and the recognition will be completed immediately.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

#After clicking Everything, all objects recognized by the system are extracted immediately.

After choosing Cut-Outs, you will get a triangular dumpling in seconds.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

SA-1B dataset: 11 million images, 1.1 billion masks

In addition to the new models released, Meta Also released is SA-1B, the largest segmentation dataset to date.

This dataset consists of 11 million diverse, high-resolution, privacy-preserving images, and 1.1 billion high-quality segmentation masks.

The overall characteristics of the data set are as follows:

· Total number of images: 11 million

· Total number of masks: 1.1 billion

· Average masks per image: 100

· Average image resolution: 1500 × 2250 pixels

Note: Image or mask annotations do not have class tags

Meta specifically emphasizes that these data are collected through our data engine, all Masks are all fully automatically generated by SAM.

With the SAM model, collecting new segmentation masks is faster than ever, and interactively annotating a mask only takes about 14 seconds.

The per-mask annotation process is only 2 times slower than annotating bounding boxes. Using the fastest annotation interface, annotating bounding boxes takes about 7 seconds.

Compared to previous large-scale segmentation data collection efforts, SAM model COCO’s fully manual polygon-based mask annotation is 6.5 times faster than the previous largest data annotation effort (also model Auxiliary) 2 times faster.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

However, relying on interactive annotation masks is not enough to create more than 1 billion masked data set. Therefore, Meta built a data engine for creating SA-1B datasets.

This data engine has three "gears":

1. Model auxiliary annotation

2. The mixture of fully automatic annotation and auxiliary annotation helps to increase the diversity of collected masks

3. Fully automatic mask creation enables the expansion of the data set

Our final dataset includes over 1.1 billion segmentation masks collected on approximately 11 million authorized and privacy-preserving images.

SA-1B has 400x more masks than any existing segmentation dataset. And human evaluation studies confirm that the masks are of high quality and diversity, and in some cases are even qualitatively comparable to previous masks from smaller, fully manually annotated datasets.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

## Pictures of the SA-1B were obtained through photo providers from multiple countries, These countries span different geographic regions and income levels.

While some geographic areas are still underrepresented, SA-1B has more images and better overall representation across all regions than previous segmentation datasets.

Finally, Meta says it hopes this data can form the basis of new datasets that include additional annotations, such as textual descriptions associated with each mask.

RBG master leads the team

Ross Girshick

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

##Ross Girshick (often called the RBG guru) is a research scientist at the Facebook Artificial Intelligence Research Institute (FAIR), where he is committed to computer vision and machine learning research.

In 2012, Ross Girshick received his PhD in Computer Science from the University of Chicago under the supervision of Pedro Felzenszwalb.

Before joining FAIR, Ross was a researcher at Microsoft Research and a postdoc at the University of California, Berkeley, where his mentors were Jitendra Malik and Trevor Darrell.

He received the 2017 PAMI Young Researcher Award and the 2017 and 2021 PAMI Mark Everingham Awards in recognition of his contributions to open source software.

As we all know, Ross and He Kaiming jointly developed the target detection algorithm of the R-CNN method. In 2017, the Mask R-CNN paper by Ross and He Kaiming won the best paper in ICCV 2017.

Netizen: CV really doesn’t exist anymore

Meta created this segmentation basic model in the CV field, which made many netizens shout, “Now, CV really doesn’t exist. Exists."

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Meta scientist Justin Johnson said: "To me, Segment Anything's data engine and ChatGPT's RLHF represent the largest A new era of artificial intelligence. Instead of learning everything from noisy network data, it is better to cleverly apply human annotation combined with big data to unlock new capabilities. Supervised learning is back!"

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

#The only regret is that the SAM model release was mainly led by Ross Girshick, but He Yuming was absent.

Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV

Intimate friend "matrix Mingzi" said that this article further proves that multimodality is CV There is no tomorrow for pure CV.

The above is the detailed content of Prompt to cut out pictures with one click! Meta releases the first basic image segmentation model in history, creating a new paradigm for CV. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7526

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

How to optimize the performance of debian readdir Apr 13, 2025 am 08:48 AM

In Debian systems, readdir system calls are used to read directory contents. If its performance is not good, try the following optimization strategy: Simplify the number of directory files: Split large directories into multiple small directories as much as possible, reducing the number of items processed per readdir call. Enable directory content caching: build a cache mechanism, update the cache regularly or when directory content changes, and reduce frequent calls to readdir. Memory caches (such as Memcached or Redis) or local caches (such as files or databases) can be considered. Adopt efficient data structure: If you implement directory traversal by yourself, select more efficient data structures (such as hash tables instead of linear search) to store and access directory information

How to set the Debian Apache log level Apr 13, 2025 am 08:33 AM

This article describes how to adjust the logging level of the ApacheWeb server in the Debian system. By modifying the configuration file, you can control the verbose level of log information recorded by Apache. Method 1: Modify the main configuration file to locate the configuration file: The configuration file of Apache2.x is usually located in the /etc/apache2/ directory. The file name may be apache2.conf or httpd.conf, depending on your installation method. Edit configuration file: Open configuration file with root permissions using a text editor (such as nano): sudonano/etc/apache2/apache2.conf

How to implement file sorting by debian readdir Apr 13, 2025 am 09:06 AM

In Debian systems, the readdir function is used to read directory contents, but the order in which it returns is not predefined. To sort files in a directory, you need to read all files first, and then sort them using the qsort function. The following code demonstrates how to sort directory files using readdir and qsort in Debian system: #include#include#include#include#include//Custom comparison function, used for qsortintcompare(constvoid*a,constvoid*b){returnstrcmp(*(

Debian mail server firewall configuration tips Apr 13, 2025 am 11:42 AM

Configuring a Debian mail server's firewall is an important step in ensuring server security. The following are several commonly used firewall configuration methods, including the use of iptables and firewalld. Use iptables to configure firewall to install iptables (if not already installed): sudoapt-getupdatesudoapt-getinstalliptablesView current iptables rules: sudoiptables-L configuration

How Debian OpenSSL prevents man-in-the-middle attacks Apr 13, 2025 am 10:30 AM

In Debian systems, OpenSSL is an important library for encryption, decryption and certificate management. To prevent a man-in-the-middle attack (MITM), the following measures can be taken: Use HTTPS: Ensure that all network requests use the HTTPS protocol instead of HTTP. HTTPS uses TLS (Transport Layer Security Protocol) to encrypt communication data to ensure that the data is not stolen or tampered during transmission. Verify server certificate: Manually verify the server certificate on the client to ensure it is trustworthy. The server can be manually verified through the delegate method of URLSession

How debian readdir integrates with other tools Apr 13, 2025 am 09:42 AM

The readdir function in the Debian system is a system call used to read directory contents and is often used in C programming. This article will explain how to integrate readdir with other tools to enhance its functionality. Method 1: Combining C language program and pipeline First, write a C program to call the readdir function and output the result: #include#include#include#includeintmain(intargc,char*argv[]){DIR*dir;structdirent*entry;if(argc!=2){

How to do Debian Hadoop log management Apr 13, 2025 am 10:45 AM

Managing Hadoop logs on Debian, you can follow the following steps and best practices: Log Aggregation Enable log aggregation: Set yarn.log-aggregation-enable to true in the yarn-site.xml file to enable log aggregation. Configure log retention policy: Set yarn.log-aggregation.retain-seconds to define the retention time of the log, such as 172800 seconds (2 days). Specify log storage path: via yarn.n

Debian mail server SSL certificate installation method Apr 13, 2025 am 11:39 AM

The steps to install an SSL certificate on the Debian mail server are as follows: 1. Install the OpenSSL toolkit First, make sure that the OpenSSL toolkit is already installed on your system. If not installed, you can use the following command to install: sudoapt-getupdatesudoapt-getinstallopenssl2. Generate private key and certificate request Next, use OpenSSL to generate a 2048-bit RSA private key and a certificate request (CSR): openss

See all articles