Gaussian-Based Instance-Adaptive Intensity Modeling for Point-Supervised Facial Expression Spotting

概要

Point-supervised facial expression spotting (P-FES) aims to localize facial expression instances in untrimmed videos, requiring only a single timestamp label for each instance during training. To address label sparsity, hard pseudo-labeling is often employed to propagate point labels to unlabeled frames; however, this approach can lead to confusion when distinguishing between neutral and expression frames with various intensities, which can negatively impact model performance. In this paper, we propose a two-branch framework for P-FES that incorporates a Gaussian-based instance-adaptive Intensity Modeling (GIM) module for soft pseudo-labeling. GIM models the expression intensity distribution for each instance. Specifically, we detect the pseudo-apex frame around each point label, estimate the duration, and construct a Gaussian distribution for each expression instance. We then assign soft pseudo-labels to pseudo-expression frames as intensity values based on the Gaussian distribution. Additionally, we introduce an Intensity-Aware Contrastive (IAC) loss to enhance discriminative feature learning and suppress neutral noise by contrasting neutral frames with expression frames of various intensities. Extensive experiments on the SAMM-LV and CAS(ME) datasets demonstrate the effectiveness of our proposed framework. Code is available at https://github.com/KinopioIsAllIn/GIM.

論文種別
発表文献
Proceedings of the 13th International Conference on Learning Representations (ICLR 2025)
Yicheng Deng
Yicheng Deng
博士後期課程学生
早志英朗
早志英朗
准教授

深層学習やベイズ推定を基盤とした機械学習アルゴリズムの開発を中心に、生体信号解析、医用画像処理などの応用研究に従事。

長原一
長原一
教授

コンピューテーショナルフォトグラフィ、コンピュータビジョンを専門とし実世界センシングや情報処理技術、画像認識技術の研究を行う。さらに、画像センシングにとどまらず様々なセンサに拡張したコンピュテーショナルセンシング手法の開発や高次元で冗長な実世界ビッグデータから意味のある情報を計測するスパースセンシングへの転換を目指す。