


Video segmentation finale! Zhejiang University recently released SAM-Track: universal intelligent video segmentation with one click
Recently, the ReLER Laboratory of Zhejiang University deeply combined SAM with video segmentation and released Segment-and-Track Anything (SAM-Track).
SAM-Track gives SAM the ability to track video targets and supports multiple ways of interaction (points, brushes, text).
On this basis, SAM-Track unifies multiple traditional video segmentation tasks, achieves one-click segmentation and tracking of any target in any video, and extrapolates traditional video segmentation to universal Video segmentation.
SAM-Track has excellent performance and can stably track hundreds of targets with high quality in complex scenarios with only a single card.
Project address: https://github.com/z-x-yang/Segment-and-Track -Anything
##Paper address: https://arxiv.org/abs/2305.06558
Effect displaySAM-Track supports language input as Prompt. For example, given the category text "Panda", one-click instance-level segmentation can be used to track all targets belonging to the category "Panda".
You can also give a more detailed description, such as entering the text "The leftmost panda", SAM-Track You can locate specific targets for segmentation tracking.
Compared with traditional video tracking algorithms, another powerful feature of SAM-Track is that it can target a large number of targets simultaneously. Perform tracking segmentation and automatically detect emerging objects.
SAM-Track also supports the combination of multiple interactive methods, and users can match them according to actual needs. For example, use a brush to frame a skateboard that is closely connected to the human body to prevent segmentation of redundant objects, and then use clicks to select the human body.
Fully automatic video target segmentation and tracking is naturally a problem. Various application scenarios include street views, aerial photography, AR, animation, medical images, etc., all of which can be segmented and tracked automatically with one click. Detect emerging objects.
If the user is not satisfied with the automatic segmentation result, the user can edit and correct it on this basis, for example, use click to correct it Over-divided trams.
At the same time, the latest version of SAM-Track supports online browsing of tracking results, and you can choose to segment any frame in the middle As a result, modify and add goals, and track again.
In order to facilitate users’ online experience, the project provides WebUI, which can be deployed with one click through Colab:
Model composition
The SAM-Track model is based on the four-track championship scheme DeAOT of the ECCV'22 VOT Workshop.
DeAOT is an efficient multi-objective VOS model. Given the object annotation of the first frame, it can track and segment objects in the remaining frames of the video.
DeAOT uses a recognition mechanism to embed multiple targets in a video into the same high-dimensional space, thereby achieving simultaneous tracking of multiple objects.
DeAOT’s speed performance in multi-object tracking is comparable to other VOS methods for single-object tracking.
In addition, through the layered Transformer-based propagation mechanism, DeAOT better aggregates long-term and short-term information, showing excellent tracking performance.
Since DeAOT requires reference frame annotation for initialization, in order to improve convenience, SAM-Track uses the Segment Anything Model (SAM) model that has recently made a splash in the field of image segmentation to obtain Label information.
Using SAM’s excellent zero-sample migration capabilities and multiple interaction methods, SAM-Track can efficiently obtain high-quality reference frame annotation information for DeAOT.
Although the SAM model performs well in the field of image segmentation, it cannot output semantic labels, and text prompts cannot well support Referring Object Segmentation and other tasks that rely on deep semantic understanding.
Therefore, the SAM-Track model further integrates Grounding-DINO to achieve high-precision language-guided video segmentation. Grounding DINO is an open set object detection model with good language understanding capabilities.
Based on the input category or detailed description of the target object, Grounding-DINO can detect the target and return the location box.
SAM-Track model architecture
As shown in the figure below, the SAM-Track model supports three object tracking modes, namely interactive tracking mode, automatic tracking mode and Fusion mode.
For interactive tracking mode, the SAM-Track model first applies SAM, using clicks or frames in the reference frame Select the target in this way until the interactive segmentation result that satisfies the user is obtained.
If you want to implement language-guided video object segmentation, SAM-Track will call Grounding-DINO based on the input text to first obtain the position frame of the target object, and based on this Obtain the segmentation results of the object of interest through SAM.
Finally, DeAOT uses the interactive segmentation result as a reference frame to track the selected target. During the tracking process, DeAOT will layer-wise propagate the visual embedding and high-dimensional ID embedding in past frames to the current frame to achieve frame-by-frame tracking and segmentation of multiple target objects. Therefore, SAM-Track can track objects of interest in segmented videos by supporting multi-modal interactions.
However, the interactive tracking mode cannot handle newly emerged objects appearing in the video. Limits the application of SAM-Track in specific fields, such as autonomous driving, smart cities, etc.
In order to further expand the application scope and performance of SAM-Track, SAM-Track implements automatic tracking mode to track new objects appearing in the video.
The automatic tracking mode uses Segment Everything and Object of Interest Segmentation to obtain annotations of new objects appearing in every n frames. For the ID assignment problem of newly emerging objects, SAM-Track uses the comparison mask module (CMR) to determine the ID of the new object.
The fusion mode combines the interactive tracking mode and the automatic tracking mode. Interactive tracking mode allows users to easily obtain annotations for the first frame of a video, while automatic tracking mode handles new, unselected objects that appear in subsequent frames of the video. The combination of tracking methods expands the application scope of SAM-Track and increases the practicality of SAM-Track.
The above is the detailed content of Video segmentation finale! Zhejiang University recently released SAM-Track: universal intelligent video segmentation with one click. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Which folder does the browser cache the video in? When we use the Internet browser every day, we often watch various online videos, such as watching music videos on YouTube or watching movies on Netflix. These videos will be cached by the browser during the loading process so that they can be loaded quickly when played again in the future. So the question is, in which folder are these cached videos actually stored? Different browsers store cached video folders in different locations. Below we will introduce several common browsers and their

With the rise of short video platforms, Douyin has become an indispensable part of everyone's daily life. On TikTok, we can see interesting videos from all over the world. Some people like to post other people’s videos, which raises a question: Is Douyin infringing upon posting other people’s videos? This article will discuss this issue and tell you how to edit videos without infringement and how to avoid infringement issues. 1. Is it infringing upon Douyin’s posting of other people’s videos? According to the provisions of my country's Copyright Law, unauthorized use of the copyright owner's works without the permission of the copyright owner is an infringement. Therefore, posting other people’s videos on Douyin without the permission of the original author or copyright owner is an infringement. 2. How to edit a video without infringement? 1. Use of public domain or licensed content: Public

How to remove watermarks from videos in Wink? There is a tool to remove watermarks from videos in winkAPP, but most friends don’t know how to remove watermarks from videos in wink. Next is the picture of how to remove watermarks from videos in Wink brought by the editor. Text tutorial, interested users come and take a look! How to remove video watermarks in Wink 1. First open wink APP and select the [Remove Watermark] function in the homepage area; 2. Then select the video you want to remove the watermark in the album; 3. Then select the video and click the upper right corner after editing the video. [√]; 4. Finally, click [One-click Print] as shown in the figure below and then click [Process].

Douyin, the national short video platform, not only allows us to enjoy a variety of interesting and novel short videos in our free time, but also gives us a stage to show ourselves and realize our values. So, how to make money by posting videos on Douyin? This article will answer this question in detail and help you make more money on TikTok. 1. How to make money from posting videos on Douyin? After posting a video and gaining a certain amount of views on Douyin, you will have the opportunity to participate in the advertising sharing plan. This income method is one of the most familiar to Douyin users and is also the main source of income for many creators. Douyin decides whether to provide advertising sharing opportunities based on various factors such as account weight, video content, and audience feedback. The TikTok platform allows viewers to support their favorite creators by sending gifts,

1. Introduction With the popularization of mobile devices and the improvement of computing power, image segmentation technology has become a research hotspot. MobileSAM (MobileSegmentAnythingModel) is an image segmentation model optimized for mobile devices. It aims to reduce computational complexity and memory usage while maintaining high-quality segmentation results, so as to run efficiently on mobile devices with limited resources. This article will introduce the principles, advantages and application scenarios of MobileSAM in detail. 2. Design ideas of the MobileSAM model. The design ideas of the MobileSAM model mainly include the following aspects: Lightweight model: In order to adapt to the resource limitations of mobile devices, the MobileSAM model adopts a lightweight model.

On iOS devices, the Camera app allows you to shoot slow-motion video, or even 240 frames per second if you have the latest iPhone. This capability allows you to capture high-speed action in rich detail. But sometimes, you may want to play slow-motion videos at normal speed so you can better appreciate the details and action in the video. In this article, we will explain all the methods to remove slow motion from existing videos on iPhone. How to Remove Slow Motion from Videos on iPhone [2 Methods] You can use Photos App or iMovie App to remove slow motion from videos on your device. Method 1: Open on iPhone using Photos app

With the rise of short video platforms, Xiaohongshu has become a platform for many people to share their lives, express themselves, and gain traffic. On this platform, publishing video works is a very popular way of interaction. So, how to publish Xiaohongshu video works? 1. How to publish Xiaohongshu video works? First, make sure you have a video content ready to share. You can use your mobile phone or other camera equipment to shoot, but you need to pay attention to the image quality and sound clarity. 2. Edit the video: In order to make the work more attractive, you can edit the video. You can use professional video editing software, such as Douyin, Kuaishou, etc., to add filters, music, subtitles and other elements. 3. Choose a cover: The cover is the key to attracting users to click. Choose a clear and interesting picture as the cover to attract users to click on it.

1. First open Weibo on your mobile phone and click [Me] in the lower right corner (as shown in the picture). 2. Then click [Gear] in the upper right corner to open settings (as shown in the picture). 3. Then find and open [General Settings] (as shown in the picture). 4. Then enter the [Video Follow] option (as shown in the picture). 5. Then open the [Video Upload Resolution] setting (as shown in the picture). 6. Finally, select [Original Image Quality] to avoid compression (as shown in the picture).
