Intensity-Aware Loss for Dynamic Facial Expression Recognition in the Wild

Authors: Hanting Li, Hongjing Niu, Zhaoqing Zhu, Feng Zhao

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two in-the-wild dynamic facial expression datasets (i.e., DFEW and FERV39k) indicate that our method outperforms the state-of-the-art DFER approaches.
Researcher Affiliation Academia Hanting Li, Hongjing Niu, Zhaoqing Zhu, Feng Zhao* University of Science and Technology of China, Hefei, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code will be available at https://github.com/muse1998/IAL-for-Facial Expression-Recognition.
Open Datasets Yes DFEW (Jiang et al. 2020) consists of 16,372 video clips... All the samples have been split into five same-size parts (fd1 fd5) without overlap. The division of 5 folds is originally provided when obtaining the DFEW dataset. We choose 5-fold cross-validation as the evaluation protocol. FERV39k (Wang et al. 2022) is current the largest in-the-wild DFER dataset and contains 38,935 video clips... randomly shuffled and split into training (80%/31,088 clips) and testing (20%/7,847 clips) sets without overlapping. Therefore for a fair comparison, we directly use the training set and the testing set divided by FERV39k.
Dataset Splits Yes All the samples have been split into five same-size parts (fd1 fd5) without overlap. The division of 5 folds is originally provided when obtaining the DFEW dataset. We choose 5-fold cross-validation as the evaluation protocol. FERV39k... randomly shuffled and split into training (80%/31,088 clips) and testing (20%/7,847 clips) sets without overlapping.
Hardware Specification Yes All the experiments are conducted on a single NVIDIA RTX 3090 card with Py Torch toolbox.
Software Dependencies No All the experiments are conducted on a single NVIDIA RTX 3090 card with Py Torch toolbox. The specific version number for PyTorch is not provided.
Experiment Setup Yes In our experiments, all facial images are resized to the size of 112 112. Random cropping, horizontal flipping, rotation and color jittering are employed to avoid over-fitting. We use SGD (Robbins and Monro 1951) to optimize our model with a batch size of 40. For the DFEW dataset, the learning rate is initialized to 0.001 and decreased at an exponential rate in 80 epochs for intensity-aware and cross-entropy loss function. For the FERV39k dataset, the learning rate is also initialized to 0.001, decreased at an exponential rate in 100 epochs using the same loss function. For both datasets, models are trained from scratch. As for sampling, the length of the dynamically sampled sequence is 16 (U= 8, V = 2 for FERV39K and U= 16, V = 1 for DFEW). The number of the self-attention heads and the temporal transformer encoders are set at 4 and 2. By default, the dimensionality-reduction ratio r is set at 16, and the loss coefficients λ is set at 0.1.