Optimal Transport-based Identity Matching for Identity-invariant Facial Expression Recognition

Authors: Daeha Kim, Byung Cheol Song

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulations prove that the proposed FER method improves the PCC/CCC performance by up to 10% or more compared to the runner-up on wild datasets. The source code and software demo are available at https://github.com/kdhht2334/ELIM_FER. Contributions of this paper are summarized as follows: (II) The state-of-the-art (SOTA) performance of ELIM is verified through extensive experiments on various real-world datasets. Especially, ELIM works more accurately compared to prior arts even for samples in which inconsistency between facial expressions and the emotion label predictions exists (see Fig. 3). 5 Experiments 5.1 Datasets and Implementation Details 5.2 Idea Verification of ELIM 5.3 Comparison with State-of-the-art Methods 5.4 Ablation Studies and Discussions
Researcher Affiliation Academia Daeha Kim Inha University kdhht5022@gmail.com Byung Cheol Song Inha University bcsong@inha.ac.kr
Pseudocode Yes Algorithm 1 Training Procedure of ELIM
Open Source Code Yes The source code and software demo are available at https://github.com/kdhht2334/ELIM_FER. A link to the full source code is added to the camera-ready version.
Open Datasets Yes We adopted public databases only for research purposes, and informed consent was obtained if necessary. AFEW-VA [28] derived from the AFEW dataset [10] consists of about 600 short video clips annotated with VA labels frame by frame. Aff-wild [50] is the first large-scale in-the-wild VA dataset that collects the reactions of people who watch movies or TV shows. Aff-wild2, which added videos of spontaneous facial behaviors to Aff-wild, was also released [25].
Dataset Splits Yes Evaluation was performed through cross validation at a ratio of 5:1. Since the test data of Aff-wild was not disclosed, this paper adopted the sampled train set for evaluation purpose in the same way as previous works [17, 22]. Table 3: Results on the validation set of Affwild2.
Hardware Specification Yes All models were implemented in Py Torch, and the following experiments were performed on Intel Xeon CPU and NVIDIA RTX A6000 GPU.
Software Dependencies No The paper states 'All models were implemented in Py Torch' but does not specify the version number of PyTorch or any other software dependencies with their versions.
Experiment Setup Yes The mini-batch size for those models was set to 512, 256, and 128, respectively (cf. Appendix B for model details). Learnable parameters (ϕ, θ) were optimized with Adam optimizer [23]. The learning rate (LR) of ϕ and θ was set to 5e-5 and 1e-5, respectively. LR decreases 0.8-fold at the initial 5K iterations and 0.8-fold every 20K iterations. ϵ of Eq. 4 is 1e-6. At each iteration, the number of ID, i.e., N was set to 10 by default. The size d of z is 64, and the size k of Gumbel-based sampling in Sec. 4.3 is 10.