Learning Affordance Landscapes for Interaction Exploration in 3D Environments

Authors: Tushar Nagarajan, Kristen Grauman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate agents ability to interact with as many objects as possible (Sec. 4.1) and enhance policy learning on downstream tasks (Sec. 4.2). Simulation environment We experiment with AI2-i THOR [30] (see Fig. 1), since it supports context-specific interactions that can change object states, vs. simple physics-based interactions in other 3D indoor environments [59, 8]. The results show agents can learn how to use new home environments intelligently and that it prepares them to rapidly address various downstream tasks like find a knife and put it in the drawer.
Researcher Affiliation Collaboration Tushar Nagarajan UT Austin and Facebook AI Research tushar@cs.utexas.edu Kristen Grauman UT Austin and Facebook AI Research grauman@fb.com
Pseudocode No The paper describes its methods in text and diagrams (e.g., Figure 2), but does not provide a formal pseudocode or algorithm block.
Open Source Code Yes Project page: http://vision. cs.utexas.edu/projects/interaction-exploration/
Open Datasets Yes Simulation environment We experiment with AI2-i THOR [30] (see Fig. 1), since it supports context-specific interactions that can change object states, vs. simple physics-based interactions in other 3D indoor environments [59, 8].
Dataset Splits Yes We split the 30 scenes into training (20), validation (5), and testing (5) sets.
Hardware Specification No The paper mentions 'UT Systems Administration team for their help setting up experiments on the cluster' in the Acknowledgments, indicating experiments were run on a cluster. However, it does not provide specific hardware details such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper mentions using algorithms and architectures such as 'PPO [54]' and 'U-Net [49] architecture'. However, it does not specify any software dependencies or libraries with their corresponding version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes At each time step, we receive the current egocentric frame x and generate its affordance maps ˆy = FA(x). The visual observations and affordance maps are encoded using a 3-layer convolutional neural network (CNN) each, and then concatenated and merged using a fully connected layer. This is then fed to a gated recurrent unit (GRU) recurrent neural network to aggregate observations over time, and finally to an actor-critic network (fully connected layers) to generate the next action distribution and value. We train this network using PPO [54] for 1M frames, with rollouts of T = 256 time steps.