Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multimodal Gesture Recognition via Multiple Hypotheses Rescoring

Authors: Vassilis Pitsikalis, Athanasios Katsamanis, Stavros Theodorakis, Petros Maragos

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The overall approach achieves 93.3% gesture recognition accuracy in the Cha Learn Kinect-based multimodal data set, significantly outperforming all recently published approaches on the same challenging multimodal gesture recognition task, providing a relative error rate reduction of at least 47.6%.
Researcher Affiliation Academia Vassilis Pitsikalis EMAIL Athanasios Katsamanis EMAIL Stavros Theodorakis EMAIL Petros Maragos EMAIL National Technical University of Athens School of Electrical and Computer Engineering Zografou Campus, Athens 15773, Greece
Pseudocode Yes Algorithm 1 Multimodal Scoring and Resorting of Hypotheses Algorithm 2 Segmental Parallel Fusion
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes For our experiments we employ the Cha Learn multimodal gesture challenge data set, introduced by Escalera et al. (2013b).
Dataset Splits Yes The data set contains three separate sets, namely for development, validation and final evaluation, including 39 users and 13858 gesture-word instances in total.
Hardware Specification Yes For the measurements we employed an AMD Opteron(tm) Processor 6386 at 2.80GHz with 32GB RAM.
Software Dependencies No The paper mentions algorithms and methods like Hidden Markov Models (HMMs), Baum-Welch algorithm, Viterbi algorithm, and Gaussian mixture models (GMMs), and 'The HTK Book', but it does not specify any software packages or libraries with version numbers used for implementation.
Experiment Setup Yes For skeleton, we train left-right HMMs with 12 states and 2 Gaussians per state. For handshape, the models correspondingly have 8 states and 3 Gaussians per state while speech gesture models have 22 states and 10 Gaussians per state. ... N is chosen to be equal to 200. ... The best weight combination for the multimodal hypothesis rescoring component is found to be w SK,HS,AU = [63.6, 9.1, 27.3] ... the best combination of weights for the segmental fusion component is [0.6, 0.6, 98.8].