Efficient single-stage short-term RGB-T single target tracking method based on Transformer-AI-php.cn

Table of Contents

Introduction

Method

Experimental results

Summary

Author information

Home

Technology peripherals

Efficient single-stage short-term RGB-T single target tracking method based on Transformer

PHPz

Apr 23, 2024 am 08:01 AM

git theory

Introduction

As shown in Figure 1, the existing three-stage RGB-T single target tracking network usually uses two independent feature extraction branches, which are respectively responsible for extracting the features of the two modalities. However, mutually independent feature extraction branches will lead to a lack of effective information interaction between the two modalities in the feature extraction stage. Therefore, once the network completes offline training, it can only extract fixed features from each modal image and cannot dynamically adjust according to the actual modal state to extract more targeted dynamic features. This limitation restricts the network's ability to adapt to diverse target bimodal appearances and the dynamic correspondence between modal appearances. As shown in Figure 2, this feature extraction method is not suitable for practical application scenarios of RGB-T single target tracking, especially in complex environments, because the arbitrariness of the tracked target will lead to diverse bimodal appearances of the target, and The dynamic relationship between the two modalities also changes as the tracking environment changes. Three-stage fusion tracking cannot adapt to this situation well, resulting in an obvious speed bottleneck.

Except for the RGB-T single target tracking network based on Transformer, it uses direct addition or cascade to combine the features of the two modal search areas and input the prediction head for outputting the final prediction results. However, the video images provided by the current RGB-T single target tracking data set are not completely aligned, and not every modal search area can provide effective information, such as the RGB modal search area in dark night and hot cross-tracking scenarios. And the infrared outer search area will not be able to provide effective target appearance information, and there will be a lot of background noise. Therefore, merging features directly by element-wise addition or cascade does not take into account the problem of merging features in different search areas. To solve this problem, this paper proposes a new method called Fusion Feature Selection Module (FFSM). The FFSM module is mainly used to select search area features of target appearance with effective information. Specifically, the FFSM module first learns the weight of each search area feature through the attention mechanism. Then, the search area features are weighted and summed based on these weights to obtain the final fusion features. This mechanism can effectively filter out invalid background noise and extract target appearance information with higher importance, thereby improving RGB-T single target tracking performance. In order to verify the effectiveness of the FFSM module, we conducted experiments in the presence of a large amount of background noise. Experimental results show that the RGB-T single target tracking network using the FFSM module achieves better performance in target tracking compared with direct element-wise addition or cascade. In dark night and hot cross-tracking scenarios, the FFSM module can accurately select effective target appearance information, improving the accuracy and robustness of target tracking. In short, the introduction of the FFSM module effectively solves the problem of direct feature fusion and improves the performance of the RGB-T single target tracking network. This method can be widely used in the presence of a large amount of background noise

Efficient single-stage short-term RGB-T single target tracking method based on Transformer — Figure 1

This article introduces an efficient single-stage RGB-T single target tracking network USTrack based on Transformer. Its core is to directly unify the three functional parts of the three-stage fusion tracking method into a ViT backbone network for simultaneous execution through joint feature extraction, fusion and correlation modeling methods, thereby achieving direct extraction of target templates and search under modal interaction. The fusion features of the region and construct the association modeling between the two fusion features, thus greatly improving the tracking speed and accuracy. In addition, USTrack also designed a feature selection mechanism based on modal reliability, which can reduce the interference of invalid modes by directly suppressing the generation of invalid modes, thereby reducing the impact of noise information on the final tracking results. Ultimately, USTrack created the fastest speed in current RGB-T single target tracking at 84.2FPS, and greatly reduced noise information by minimizing the position deviation of the target in the two modal images and mitigating the impact of invalid modal information on the tracking results. impact on the final forecast results.

The contributions of this article are as follows:

The current three-stage fusion tracking network has the problem of lack of modal interaction in the modal feature extraction stage. This chapter proposes a joint feature extraction & fusion & correlation modeling method. This method can directly extract the fusion features of the target template and the search area under the interaction of modalities, and simultaneously perform the correlation modeling operation between the two fusion features. For the first time, an efficient and concise single-stage fusion tracking paradigm is provided for the design of short-term RGB-T single target tracking network.

Without changing the meaning of the original text, adjust the sentence structure, "(2) For the first time, a feature selection mechanism based on modal reliability is proposed. This mechanism can evaluate the reliability of different modal images according to the actual tracking environment, and evaluate the reliability based on the reliability. Discard the fusion features generated by invalid modalities to reduce the impact of noise information on the final prediction results, thereby further improving tracking performance."

This article introduces the results on three mainstream RGB-T single target tracking benchmark data sets. A large number of experiments show that the method in this article not only achieves new SoTA performance, but also creates the fastest tracking speed of up to 84.2FPS. Especially on the VTUAV short-term tracker dataset and long-term tracking dataset, USTrack outperforms the best existing methods by 11.1%/11.7% and 11.3%/9.7% on MPR/MSR metrics.

Method

As shown in Figure 3, the overall architecture of USTrack consists of three parts: dual embedding layers, ViT backbone network and feature selection mechanism based on modal reliability. Dual embedded layers consist of two independent embedded layers. This is considering that the attention mechanism obtains global information based on similarity, and the inherent performance of different modal data may cause the two modalities to have different feature representations for the same pattern. If the model is directly mapped through attention, This heterogeneity may limit the network's ability to model modal state shared information, thus affecting the subsequent feature fusion process. Therefore, USTrack uses two learnable embedding layers to map inputs corresponding to different modalities into a space that is conducive to fusion, to align the two modalities to a certain extent, and reduce the impact of modal intrinsics on feature fusion. . Then, all the outputs of the double embedding layer are jointly used as the input of the ViT backbone network, and are directly passed through the attention layer. It fuses modal information, feature fusion and target template fusion through attention, unifies the three functional stages of RGB-T tracking, and provides an efficient single-stage tracking paradigm for RGB-T tracking.

The feature selection mechanism based on pattern reliability is a prediction head and two reliability evaluation modules. It allows the two prediction heads to output different results, and based on the pattern reliability score, helps the network select the search area corresponding to the pattern that is more suitable for the current tracking scenario. The feature selection mechanism can be used in the final prediction to reduce the impact of noise information generated by invalid patterns on the final prediction result.

Experimental results

USTrack selected GTOT, RGB234 and VTUAV data sets as test benchmarks, and the test results are shown in Figure 4 Show. We also used VTUAV as a benchmark to analyze the performance of USTrack in different challenge scenarios. As shown in Figure 5, this article has screened out the six challenging attributes with the most obvious performance improvements. They are: deformation (DEF), scale change (SV), complete occlusion (FO), partial occlusion (PO), thermal crossover (TC) and extreme illumination (EI). Specifically, the deformation (DEF) and scale change (SV) challenge attributes can effectively demonstrate the differences in the appearance of the target during the tracking process. Full occlusion (FO), partial occlusion (PO), thermal crossover (TC) and extreme illumination (EI) challenge attributes can cause the appearance of the corresponding modal state to change or disappear, effectively demonstrating the dynamics of the target in different challenge scenarios relation. USTrack achieved the most significant performance improvements in tracking scenarios with these challenging attributes, and it can be evaluated that the joint feature extraction & fusion & correlation modeling approach can effectively alleviate the problem of insufficient interaction of modal features in the extraction stage in the three-stage fusion tracking paradigm, It can better adapt to the dynamic relationship between different appearances and modalities of the target during tracking.

As shown in Figure 6 and, in order to verify based on Regarding the effectiveness of the feature selection mechanism for modal reliability, we conducted comparative experiments on the dual prediction head structure with feature selection mechanism and several common prediction head structures on the RGBT234 benchmark data set, and gave the modal reliability The visualization result shows a good correspondence with the actual tracking scene.

Summary

This chapter proposes a Transformer-based Efficient single-stage short-term RGB-T single target tracking network USTrack. The core of USTrack is to propose a joint feature extraction & fusion & correlation modeling method to solve the problem of lack of modal interaction in the feature extraction stage of the traditional three-stage fusion tracking network. This enhances the tracking network's adaptability to diverse target bimodal appearances and the dynamic correspondence between modal appearances. On this basis, a feature selection mechanism based on modal reliability is further proposed. This mechanism reduces the impact of noise information on the final prediction result by directly discarding the fusion features generated by invalid modes, thereby achieving better tracking performance. USTrack achieves SoTA performance on three mainstream datasets and sets a new record for the fastest RGB-T tracking inference speed at 84.2 FPS. It is worth noting that on the currently largest RGB-T single target tracking benchmark data set VTUAV, this method increases the evaluation indicators MPR/MSR by 11.1%/11.7% and 11.3%/9.7% respectively compared with the existing SoTA method. , achieved a major performance breakthrough, adding a new and powerful baseline method to this benchmark data set.

Author information

1. Xia Jianqiang

Master’s student at the Institute of National Defense Science and Technology Innovation, Academy of Military Sciences. Research interests include visual image processing, target detection, single target tracking, etc. The first author published an article at the CCF Class A conference and won the first prize for Huawei in the 2022 "Huawei Cup" Fourth China Graduate Artificial Intelligence Innovation Competition.

2. Zhao Jian

Zhao Jian, head of the Multimedia Cognitive Learning Laboratory (EVOL Lab) of the China Telecom Artificial Intelligence Research Institute, a young scientist, and a researcher at the Institute of Optoelectronics and Intelligence of Northwestern Polytechnical University , graduated with a Ph.D. from the National University of Singapore. His research interests include multimedia analysis, local security, and embodied intelligence.

A total of 32 CCF-A papers were published on unconstrained visual perception understanding, and 31 papers were published as the first/corresponding author in international authoritative journals and conferences such as T-PAMI and CVPR, including one T- PAMI×2 (IF: 24.314), IJCV×3 (IF: 13.369), and the first inventor has authorized 5 national invention patents. Relevant technological achievements have been applied by six leading companies in the technology industry, including Baidu, Ant Financial, and Qihoo 360, and have produced significant benefits. He was selected into the "Young Talent Promotion Project" of the China Association for Science and Technology and the Beijing Association for Science and Technology, and hosted 6 projects including the National Natural Youth Science Fund. Won the Wu Wenjun Artificial Intelligence Outstanding Youth Award (2023), the first prize of the Wu Wenjun Artificial Intelligence Natural Science Award (2/5, 2022), the Singapore Pattern Recognition and Machine Intelligence Association (PREMIA) Lee Hwee Kuan Award, and the only best student of ACM Multimedia Paper Award (first work, 1/208, CCF-A conference, 2018), won the championship 7 times in important international scientific and technological events.

Serves as director of the Beijing Image and Graphics Society, editorial board member of the internationally renowned journals "Artificial Intelligence Advances" and "IET Computer Vision", guest editor of special issues of "Pattern Recognition Letters" and "Electronics", and senior field chairman of VALSE. Chairman of the ACM Multimedia 2021 sub-forum, Chairman of the CICAI 2022/2023 Area, Chairman of the CCBR 2024 Forum, senior member of the Chinese Society for Artificial Intelligence/China Image and Graphics Society, judge of the "Challenge Cup" College Student Science and Technology Works Competition, and member of the Expert Committee of the China Artificial Intelligence Competition wait.

Homepage: https://zhaoj9014.github.io

Screenshot of paper

Paper link

https://arxiv.org/abs/2308.13764

##Code link

https://github.com/xiajianqiang

The above is the detailed content of Efficient single-stage short-term RGB-T single target tracking method based on Transformer. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7733

Java Tutorial

1643

CakePHP Tutorial

1397

Laravel Tutorial

1290

PHP Tutorial

1233

Related knowledge

How to download git projects to local Apr 17, 2025 pm 04:36 PM

To download projects locally via Git, follow these steps: Install Git. Navigate to the project directory. cloning the remote repository using the following command: git clone https://github.com/username/repository-name.git

How to update code in git Apr 17, 2025 pm 04:45 PM

Steps to update git code: Check out code: git clone https://github.com/username/repo.git Get the latest changes: git fetch merge changes: git merge origin/master push changes (optional): git push origin master

How to merge code in git Apr 17, 2025 pm 04:39 PM

Git code merge process: Pull the latest changes to avoid conflicts. Switch to the branch you want to merge. Initiate a merge, specifying the branch to merge. Resolve merge conflicts (if any). Staging and commit merge, providing commit message.

How to use git commit Apr 17, 2025 pm 03:57 PM

Git Commit is a command that records file changes to a Git repository to save a snapshot of the current state of the project. How to use it is as follows: Add changes to the temporary storage area Write a concise and informative submission message to save and exit the submission message to complete the submission optionally: Add a signature for the submission Use git log to view the submission content

What to do if the git download is not active Apr 17, 2025 pm 04:54 PM

Resolve: When Git download speed is slow, you can take the following steps: Check the network connection and try to switch the connection method. Optimize Git configuration: Increase the POST buffer size (git config --global http.postBuffer 524288000), and reduce the low-speed limit (git config --global http.lowSpeedLimit 1000). Use a Git proxy (such as git-proxy or git-lfs-proxy). Try using a different Git client (such as Sourcetree or Github Desktop). Check for fire protection

How to delete a repository by git Apr 17, 2025 pm 04:03 PM

To delete a Git repository, follow these steps: Confirm the repository you want to delete. Local deletion of repository: Use the rm -rf command to delete its folder. Remotely delete a warehouse: Navigate to the warehouse settings, find the "Delete Warehouse" option, and confirm the operation.

How to update local code in git Apr 17, 2025 pm 04:48 PM

How to update local Git code? Use git fetch to pull the latest changes from the remote repository. Merge remote changes to the local branch using git merge origin/<remote branch name>. Resolve conflicts arising from mergers. Use git commit -m "Merge branch <Remote branch name>" to submit merge changes and apply updates.

How to solve the efficient search problem in PHP projects? Typesense helps you achieve it! Apr 17, 2025 pm 08:15 PM

When developing an e-commerce website, I encountered a difficult problem: How to achieve efficient search functions in large amounts of product data? Traditional database searches are inefficient and have poor user experience. After some research, I discovered the search engine Typesense and solved this problem through its official PHP client typesense/typesense-php, which greatly improved the search performance.

See all articles