reproducibilityindex.ai

Placing Objects in Gesture Space: Toward Incremental Interpretation of Multimodal Spatial Descriptions

Authors: Ting Han, Casey Kennington, David Schlangen

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we model the hearer s task, using a multimodal spatial description corpus we collected. To reduce the variability of verbal descriptions, we simpliﬁed the setup to use simple objects as landmarks. We describe a real-time system to evaluate the separate and joint contributions of the modalities. We show that gestures not only help to improve the overall system performance, even if to a large extent they encode redundant information, but also result in earlier ﬁnal correct interpretations.
Researcher Affiliation	Academia	Ting Han,1 Casey Kennington,2 David Schlangen1 1Dialogue Systems Group // CITEC, Bielefeld University, 2Boise State University {ting.han, david.schlangen}@uni-bielefeld.de, caseykennington@boisestate.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The corpus is publicly available.1 (1https://pub.uni-bielefeld.de/data/2913177). This refers to the corpus, not the source code for the methodology or system described in the paper.
Open Datasets	Yes	We collected a multimodal spatial description corpus which was elicited with a simpliﬁed scene description task (see details in Data collection). The corpus is publicly available.1 (1https://pub.uni-bielefeld.de/data/2913177).
Dataset Splits	No	The paper states 'The training was stopped when validation loss stopped decreasing.' which implies a validation set was used, but it does not provide specific details on how this validation split was created (e.g., percentages, counts, or specific methodology for the split beyond the hold-one-out for train/test).
Hardware Specification	No	The paper mentions that 'hand motion was tracked by a Leap sensor' and 'The classiﬁcation for each stroke hold takes around 10 to 20 ms, correlated to the computational ability of the machine.' However, it does not provide specific hardware details (e.g., CPU/GPU models, memory) of the machine used for running experiments.
Software Dependencies	No	The paper mentions software like 'Keras (Chollet 2015)', 'Inpro TK toolkit (Baumann and Schlangen 2012)', and 'ELAN, a software for annotation', but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The LSTM classiﬁer includes two hidden layers and a sigmoid dense layer to give predictions. The ﬁrst hidden layer has 68 nodes whose outputs are deﬁned by tanh activation functions. The second hidden layer has 38 nodes and outputs via the dense layer. A dropout layer is applied to the second layer to enable more effective learning. 50% of the input units are randomly selected and set to 0 to avoid overﬁtting. We chose a binary cross entropy loss function optimised with a rmsprop optimiser. The training was stopped when validation loss stopped decreasing. We ﬁt a Gaussian KDE model (with the bandwidth setting to 5). When combining speech with gestures, the average eo is slightly higher.