reproducibilityindex.ai

Learning Interpretable Spatial Operations in a Rich 3D Blocks World

Authors: Yonatan Bisk, Kevin Shih, Yejin Choi, Daniel Marcu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we study the problem of mapping natural language instructions to complex spatial actions in a 3D blocks world. We ﬁrst introduce a new dataset that pairs complex 3D spatial operations to rich natural language descriptions that require complex spatial and pragmatic interpretations such as mirroring , twisting , and balancing . This dataset, built on the simulation environment of Bisk, Yuret, and Marcu (2016), attains language that is signiﬁcantly richer and more complex, while also doubling the size of the original dataset in the 2D environment with 100 new world conﬁgurations and 250,000 tokens. In addition, we propose a new neural architecture that achieves competitive results while automatically discovering an inventory of interpretable spatial operations (Figure 5).
Researcher Affiliation	Collaboration	Yonatan Bisk,1 Kevin J. Shih,2 Yejin Choi,1 Daniel Marcu3 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2University of Illinois at Urbana-Champaign 3Amazon Inc. {ybisk,yejin}@cs.washington.edu, kjshih2@illinois.edu, marcud@amazon.com
Pseudocode	No	The paper does not contain any explicit pseudocode blocks or algorithms.
Open Source Code	No	The paper provides a link (https://groundedlanguage.github.io/) for their released data, but does not explicitly state that the source code for their methodology is provided or available at this link. The text only says 'In our released data,1 we captured block orientations as quaternions.'
Open Datasets	Yes	Our new dataset comprises 100 conﬁgurations split 70-20-10 between training, testing, and development. Each conﬁguration has between ﬁve and twenty steps (and blocks). We present type and token statistics in Table 1, where we use NLTK s (Bird, Klein, and Loper 2009) treebank tokenizer. In our released data,1 we captured block orientations as quaternions. This allows for a complete and accurate re-rendering of the exact block orientations produced by our annotators. 1https://groundedlanguage.github.io/
Dataset Splits	Yes	Our new dataset comprises 100 conﬁgurations split 70-20-10 between training, testing, and development.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions the model has convolutional layers.
Software Dependencies	No	The paper mentions using Adam optimizer and NLTK, but does not provide specific version numbers for any software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or other libraries.
Experiment Setup	Yes	Our model is trained end-to-end using Adam (Kingma and Ba 2014) with a batch size of 32.The convolutional aspect of the model has 3 layers and operates on a world representation of dimensions 32 4 64 64 32 (batch, depth, height, width, channels). The ﬁrst convolutional layer uses a ﬁlter of size 4 5 5 and the second of size 4 3 3, each followed by a tanh nonlinearity for the 3D model3. Both layers output a tensor with the same dimensions as the input world. The ﬁnal predicton layer is a 1 1 1 ﬁlter that projects the 32 dimensional vector at each location down to 8 values as detailed in the previous section. We further include an entropy term to encourage peakier distributions in the argument and operation softmaxes.