Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders

Authors: Abhishek Banerjee, Uttaran Bhattacharya, Aniket Bera3-10

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on the MPI Emotional Body Expressions Database (EBEDB) and obtain an accuracy of 58.43%. We see an improvement in performance compared to current state-of-the-art algorithms for generalized zero-shot learning by an absolute 25 27%.
Researcher Affiliation Academia Abhishek Banerjee, Uttaran Bhattacharya, Aniket Bera Department of Computer Science, University of Maryland College Park, Maryland 20742, USA {abanerj8, uttaranb, bera}@umd.edu
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code No The paper provides a URL (https://gamma.umd.edu/unseen_gesture_emotions) in a figure caption, but it is a project page and does not explicitly state that the source code for the methodology is released or available there.
Open Datasets Yes We train and evaluate our network on the MPI Emotional Body Expressions Database (EBEDB) (Volkova et al. 2014).
Dataset Splits No The paper mentions "validation accuracy" but does not explicitly provide the specific size or percentage of the validation split, only a "train-test split of 80% 20%".
Hardware Specification Yes Our network takes around 6 minutes to train on an Nvidia RTX 2080 GPU.
Software Dependencies No The paper mentions general software components like the Adam optimizer but does not specify version numbers for any key software libraries or dependencies (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes We train the model for 200 epochs by stochastic gradient descent using the Adam optimizer (Kingma and Ba 2014) and a batch size of 6 for features. ... we found δ = 1.5 to give us the highest harmonic mean of accuracies... Hence, we set γ = 1 for our experiments. ... We obtained the best results for d = 16 and used this in our final network.