Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions

Authors: Cynthia Matuszek, Liefeng Bo, Luke Zettlemoyer, Dieter Fox

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We collect a corpus of deictic interactions from users describing objects, which we use to train language and gesture models that allow our robot to determine what objects are being indicated. We introduce a temporal extension to state-of-the-art hierarchical matching pursuit features to support gesture understanding, and demonstrate that combining multiple communication modalities more effectively capture user intent than relying on a single type of input. Finally, we present initial interactions with a robot that uses the learned models to follow commands.
Researcher Affiliation Academia Cynthia Matuszek, Liefeng Bo, Luke Zettlemoyer, Dieter Fox cynthia | lfb | lsz | fox@cs.washington.edu Computer Science & Engineering, University of Washington Box 352350, Seattle, WA 98195
Pseudocode No The paper describes algorithms (e.g., K-SVD, sparse coding) using text and mathematical equations, but it does not include pseudocode blocks or figures explicitly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code for its described methodology, nor does it include a link to a code repository.
Open Datasets Yes We collect an RGB-D corpus of interactions from users identifying objects in a space, which was then used to train models of language, deictic gesture, and visual attributes. ... The resulting data set contains examples of language used without gesture, gesture paired with non-descriptive language (e.g., These objects), and gesture and language used together (examples can be seen at: http://tiny.cc/Gambit14).
Dataset Splits No The paper states 'Testing was performed on a held-out set of 20% of these pairs,' indicating a test split. However, it does not explicitly mention a separate validation set or provide details on cross-validation or other data partitioning strategies for reproduction beyond the test set.
Hardware Specification No The paper mentions using a 'Microsoft Kinect sensor' for data collection and the 'Gambit manipulator platform' for a prototype system. However, it does not specify the computational hardware (e.g., exact CPU or GPU models, memory) used for training the models or running the experiments.
Software Dependencies No The paper mentions using 'Google Speech API' for automatic speech recognition. However, it does not provide specific version numbers for this API or any other key software libraries or solvers used in the experimental setup.
Experiment Setup No The paper mentions one specific experimental value: 'Experimentally, = 0.2 worked well' for a threshold in score integration. However, it does not provide comprehensive details on other critical experimental setup parameters such as learning rates, batch sizes, number of epochs, or specific optimizer settings used for model training.