Efficient Facial Feature Learning with Wide Ensemble-Based Convolutional Neural Networks

Authors: Henrique Siqueira, Sven Magg, Stefan Wermter5800-5809

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present experiments on Ensembles with Shared Representations (ESRs) based on convolutional networks to demonstrate, quantitatively and qualitatively, their data processing efficiency and scalability to large-scale datasets of facial expressions. We show that redundancy and computational load can be dramatically reduced by varying the branching level of the ESR without loss of diversity and generalization power, which are both important for ensemble performance. Experiments on large-scale datasets suggest that ESRs reduce the remaining residual generalization error on the Affect Net and FER+ datasets, reach human-level performance, and outperform state-of-the-art methods on facial expression recognition in the wild using emotion and affect concepts.
Researcher Affiliation Academia Henrique Siqueira, Sven Magg, Stefan Wermter Knowledge Technology Department of Informatics, University of Hamburg Vogt-Koelln-Str. 30, 22527 Hamburg, Germany {siqueira, magg, wermter}@informatik.uni-hamburg.de
Pseudocode Yes Algorithm 1: Training ESRs. initialize the shared layers with θshared for b to maximum ensemble size do initialize the convolutional branch Bb with θb add the branch Bb to the network ESR sample a subset D from a training set D foreach mini-batch (xi, yi) D do perform the forward phase initialize the combined loss function Lesr to 0.0 foreach existing branch Bb in ESR do compute the loss Lb with respect to Bb add Lb to Lesr end perform the backward phase optimize ESR end end
Open Source Code Yes For reproducibility purposes, source code of our experiments, the ESR implementation in Py Torch, trained networks and supplementary material are available in our Git Hub repository1. 1Source code: https://github.com/knowledgetechnologyuhh/Efficient-Facial-Feature-Learning-with-Wide-Ensemble-based Convolutional-Neural-Networks
Open Datasets Yes We trained and tested the ensemble with shared representations on in-the-lab and in-the-wild datasets for a couple of reasons. ... The Extended Cohn-Kanade (CK+) dataset (Lucey et al. 2010) has been vastly used to develop action unit detection and facial expression recognition systems. ... Affect Net (Mollahosseini, Hasani, and Mahoor 2019) is the largest dataset of facial expressions in the wild publicly available. ... FER+ (Barsoum et al. 2016) derives from the reannotation of the Facial Expression Recognition 2013 (FER2013) dataset (Goodfellow et al. 2015)
Dataset Splits Yes We followed the subject-independent 10-fold crossvalidation for comparison purposes based on our previous work (Siqueira et al. 2018). ... In each trial t, we selected fold-(t) for testing, fold-(t + 1) for validating, and only the first four folds from the remaining eight folds for training, i.e., 523.2 images on average on the training set. ... Affect Net and FER+ have divided the dataset into training, validation, and test sets, and published them for the scientific community, except for the test set of the former. Meanwhile, researchers have utilized the validation set for evaluation and comparisons, as suggested by the Affect Net authors.
Hardware Specification Yes Subsequently, experiments using a single Ge Force GTX 1080 on large-scale benchmarks for facial expression recognition in the wild demonstrate the affordability and scalability of ESRs, followed by conclusions and future research.
Software Dependencies No The paper mentions "Py Torch" as the framework used for the ESR implementation, but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes The single network was trained on four folds using stochastic gradient descent (SGD) to minimize the cross-entropy loss... We adopted a momentum factor of 0.9 on SGD and a learning rate decay with a multiplicative factor of 0.5 applied after every 250 epochs. ... Data augmentation was randomly applied in all of the cases including brightness and contrast changes, horizontal flips, rotations up to 30 degrees, translations, and rescaling.