reproducibilityindex.ai

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

Authors: Tushar Nagarajan, Kristen Grauman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate agents ability to interact with as many objects as possible (Sec. 4.1) and enhance policy learning on downstream tasks (Sec. 4.2). Simulation environment We experiment with AI2-i THOR [30] (see Fig. 1), since it supports context-speciﬁc interactions that can change object states, vs. simple physics-based interactions in other 3D indoor environments [59, 8]. The results show agents can learn how to use new home environments intelligently and that it prepares them to rapidly address various downstream tasks like ﬁnd a knife and put it in the drawer.
Researcher Affiliation	Collaboration	Tushar Nagarajan UT Austin and Facebook AI Research tushar@cs.utexas.edu Kristen Grauman UT Austin and Facebook AI Research grauman@fb.com
Pseudocode	No	The paper describes its methods in text and diagrams (e.g., Figure 2), but does not provide a formal pseudocode or algorithm block.
Open Source Code	Yes	Project page: http://vision. cs.utexas.edu/projects/interaction-exploration/
Open Datasets	Yes	Simulation environment We experiment with AI2-i THOR [30] (see Fig. 1), since it supports context-speciﬁc interactions that can change object states, vs. simple physics-based interactions in other 3D indoor environments [59, 8].
Dataset Splits	Yes	We split the 30 scenes into training (20), validation (5), and testing (5) sets.
Hardware Specification	No	The paper mentions 'UT Systems Administration team for their help setting up experiments on the cluster' in the Acknowledgments, indicating experiments were run on a cluster. However, it does not provide specific hardware details such as GPU models, CPU models, or memory specifications.
Software Dependencies	No	The paper mentions using algorithms and architectures such as 'PPO [54]' and 'U-Net [49] architecture'. However, it does not specify any software dependencies or libraries with their corresponding version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	At each time step, we receive the current egocentric frame x and generate its affordance maps ˆy = FA(x). The visual observations and affordance maps are encoded using a 3-layer convolutional neural network (CNN) each, and then concatenated and merged using a fully connected layer. This is then fed to a gated recurrent unit (GRU) recurrent neural network to aggregate observations over time, and ﬁnally to an actor-critic network (fully connected layers) to generate the next action distribution and value. We train this network using PPO [54] for 1M frames, with rollouts of T = 256 time steps.