Learning Affordance Landscapes for Interaction Exploration in 3D Environments
Authors: Tushar Nagarajan, Kristen Grauman
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate agents ability to interact with as many objects as possible (Sec. 4.1) and enhance policy learning on downstream tasks (Sec. 4.2). Simulation environment We experiment with AI2-i THOR [30] (see Fig. 1), since it supports context-specific interactions that can change object states, vs. simple physics-based interactions in other 3D indoor environments [59, 8]. The results show agents can learn how to use new home environments intelligently and that it prepares them to rapidly address various downstream tasks like find a knife and put it in the drawer. |
| Researcher Affiliation | Collaboration | Tushar Nagarajan UT Austin and Facebook AI Research tushar@cs.utexas.edu Kristen Grauman UT Austin and Facebook AI Research grauman@fb.com |
| Pseudocode | No | The paper describes its methods in text and diagrams (e.g., Figure 2), but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Project page: http://vision. cs.utexas.edu/projects/interaction-exploration/ |
| Open Datasets | Yes | Simulation environment We experiment with AI2-i THOR [30] (see Fig. 1), since it supports context-specific interactions that can change object states, vs. simple physics-based interactions in other 3D indoor environments [59, 8]. |
| Dataset Splits | Yes | We split the 30 scenes into training (20), validation (5), and testing (5) sets. |
| Hardware Specification | No | The paper mentions 'UT Systems Administration team for their help setting up experiments on the cluster' in the Acknowledgments, indicating experiments were run on a cluster. However, it does not provide specific hardware details such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using algorithms and architectures such as 'PPO [54]' and 'U-Net [49] architecture'. However, it does not specify any software dependencies or libraries with their corresponding version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | At each time step, we receive the current egocentric frame x and generate its affordance maps ˆy = FA(x). The visual observations and affordance maps are encoded using a 3-layer convolutional neural network (CNN) each, and then concatenated and merged using a fully connected layer. This is then fed to a gated recurrent unit (GRU) recurrent neural network to aggregate observations over time, and finally to an actor-critic network (fully connected layers) to generate the next action distribution and value. We train this network using PPO [54] for 1M frames, with rollouts of T = 256 time steps. |