Home > Technology peripherals > AI > Facial Expression Analysis: Integrating Multimodal Information with Transformer

Facial Expression Analysis: Integrating Multimodal Information with Transformer

WBOY
Release: 2024-01-23 10:24:05
forward
1146 people have browsed it

Transformer-based Multimodal Information Fusion for Facial  Expression Analysis

Paper Introduction

Human emotional behavior analysis has attracted much attention in human-computer interaction (HCI). This article is intended to introduce the paper we submitted to CVPR 2022 Affective Behavior Analysis in-the-wild (ABAW). To fully exploit emotional knowledge, we employ multi-modal features including spoken language, speech prosody, and facial expressions extracted from video clips in the Aff-Wild2 dataset. Based on these features, we propose a transformer-based multi-modal framework for action unit detection and expression recognition. This framework contributes to a more comprehensive understanding of human emotional behavior and provides new research directions in the field of human-computer interaction.

For the current frame image, we first encode it to extract static visual features. At the same time, we also use sliding windows to crop adjacent frames and extract three multi-modal features from image, audio and text sequences. Next, we introduce a transformer-based fusion module to fuse static visual features and dynamic multi-modal features. The cross-attention module in this fusion module helps focus the output integrated features on key parts that are helpful for downstream detection tasks. In order to further improve model performance, we adopted some data balancing techniques, data augmentation techniques and post-processing methods. In the official tests of ABAW3 Competition, our model ranked first on both EXPR and AU tracks. We demonstrate the effectiveness of our proposed method through extensive quantitative evaluation and ablation studies on the Aff-Wild2 dataset.

Paper link

https://arxiv.org/abs/2203.12367

The above is the detailed content of Facial Expression Analysis: Integrating Multimodal Information with Transformer. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:163.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template