Driver Frustration Detection from Audio and Video in the Wild

Authors: Irman Abdić, Lex Fridman, Daniel McDuff, Erik Marchi, Bryan Reimer, Björn Schuller

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze a dataset of 20 drivers that contains 596 audio epochs (audio clips, with duration from 1 sec to 15 sec) and 615 video epochs (video clips, with duration from 1 sec to 45 sec). The model was subject-independently trained and tested using 4-fold cross-validation. We achieve an accuracy of 77.4 % for detecting frustration from a single audio epoch and 81.2 % for detecting frustration from a single video epoch.
Researcher Affiliation Academia 1Massachusetts Institute of Technology (MIT), USA 2Technische Universit at M unchen (TUM), Germany 3Imperial College London (ICL), UK
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper mentions using the 'open-source openSMILE feature extractor', but does not provide access to the authors' own implementation code for their methodology.
Open Datasets No The paper states: 'The dataset used for frustration detection was collected as part of a study for multi-modal assessment of on-road demand of voice and manual phone calling and voice navigation entry across two embedded vehicle systems [Mehler et al., 2015].' This cites a paper describing the collection, but does not provide direct access to the dataset itself or state that it is publicly available.
Dataset Splits Yes The model was subject-independently trained and tested using 4-fold cross-validation.
Hardware Specification No The paper does not mention any specific hardware specifications (e.g., GPU, CPU models) used for running the experiments.
Software Dependencies Yes Acoustic low-level descriptors (LLD) were automatically extracted from the speech waveform on a per-chunk level by using the open-source open SMILE feature extractor in its 2.1 release. We used a Weka 3 implementation of Support Vector Machines (SVMs).
Experiment Setup Yes We used a Weka 3 implementation of Support Vector Machines (SVMs) with the Sequential Minimal Optimization (SMO), and audio and video features described in 4 [Hall et al., 2009]. We describe a set of SMO complexity parameters as: C 2 {10 4, 5 10 4, 10 3, 5 10 3, ..., 1}.