Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions
Authors: Cynthia Matuszek, Liefeng Bo, Luke Zettlemoyer, Dieter Fox
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We collect a corpus of deictic interactions from users describing objects, which we use to train language and gesture models that allow our robot to determine what objects are being indicated. We introduce a temporal extension to state-of-the-art hierarchical matching pursuit features to support gesture understanding, and demonstrate that combining multiple communication modalities more effectively capture user intent than relying on a single type of input. Finally, we present initial interactions with a robot that uses the learned models to follow commands. |
| Researcher Affiliation | Academia | Cynthia Matuszek, Liefeng Bo, Luke Zettlemoyer, Dieter Fox cynthia | lfb | lsz | fox@cs.washington.edu Computer Science & Engineering, University of Washington Box 352350, Seattle, WA 98195 |
| Pseudocode | No | The paper describes algorithms (e.g., K-SVD, sparse coding) using text and mathematical equations, but it does not include pseudocode blocks or figures explicitly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for its described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We collect an RGB-D corpus of interactions from users identifying objects in a space, which was then used to train models of language, deictic gesture, and visual attributes. ... The resulting data set contains examples of language used without gesture, gesture paired with non-descriptive language (e.g., These objects), and gesture and language used together (examples can be seen at: http://tiny.cc/Gambit14). |
| Dataset Splits | No | The paper states 'Testing was performed on a held-out set of 20% of these pairs,' indicating a test split. However, it does not explicitly mention a separate validation set or provide details on cross-validation or other data partitioning strategies for reproduction beyond the test set. |
| Hardware Specification | No | The paper mentions using a 'Microsoft Kinect sensor' for data collection and the 'Gambit manipulator platform' for a prototype system. However, it does not specify the computational hardware (e.g., exact CPU or GPU models, memory) used for training the models or running the experiments. |
| Software Dependencies | No | The paper mentions using 'Google Speech API' for automatic speech recognition. However, it does not provide specific version numbers for this API or any other key software libraries or solvers used in the experimental setup. |
| Experiment Setup | No | The paper mentions one specific experimental value: 'Experimentally, = 0.2 worked well' for a threshold in score integration. However, it does not provide comprehensive details on other critical experimental setup parameters such as learning rates, batch sizes, number of epochs, or specific optimizer settings used for model training. |