Bridging the Imitation Gap by Adaptive Insubordination
Authors: Luca Weihs, Unnat Jain, Iou-Jen Liu, Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously compare ADVISOR to IL methods, RL methods, and popularly-adopted (but often ad hoc) IL & RL combinations. In particular, we evaluate 15 learning methods. We do this over thirteen tasks realizations of Ex. 1 & Ex. 2, eight tasks of varying complexity within the fast, versatile MINIGRID environment [8, 9], Cooperative Navigation (COOPNAV) with reduced visible range in the multi-agent particle environment (MPE) [43, 37], Point Goal navigation (POINTNAV) using the Gibson dataset in AIHABITAT [71, 54], and Object Goal Navigation (OBJECTNAV) in ROBOTHOR [14]. |
| Researcher Affiliation | Collaboration | 1Allen Institute for AI, 2University of Illinois at Urbana-Champaign {lucaw, jordis, anik}@allenai.org {uj2, iliu3, slazebni, aschwing}@illinois.edu |
| Pseudocode | Yes | In practice we train f( ; ) and aux f ( ; ) jointly using stochastic gradient descent, as summarized in Alg. A.1. |
| Open Source Code | Yes | All code to reproduce our experiments will be made public under the Apache 2.0 license.5 See https://unnat.github.io/advisor/ for an up-to-date link to this code. |
| Open Datasets | Yes | We study the benefits of ADVISOR on thirteen tasks, including POISONEDDOORS from Ex. 1, a 2D lighthouse gridworld, a suite of tasks set within the MINIGRID environment [8, 9], Cooperative Navigation with limited range (COOPNAV) in the multi-agent particle environment (MPE) [43, 38], and two navigational tasks set in 3D, high visual fidelity, simulators of real-world living environments (POINTNAV in AIHABITAT [54] and OBJECTNAV in ROBOTHOR [31, 14]). |
| Dataset Splits | No | The paper consistently refers to 'validation set performance' and 'validation reward', indicating the use of a validation set, but it does not provide specific details on how this validation set was split or generated from the experimental environment for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms like PPO and MADDPG, and cites a PyTorch implementation repository, but it does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For all MINIGRID tasks, we use a 3-layer MLP policy with 64 units per layer and ReLU activations. The agent is trained using PPO [58] with an Adam optimizer [29] with a learning rate of 5e-4 and batch size of 256 for 1M steps. For the COOPNAV tasks, we use an MLP with two 64-unit layers and ReLU activations. The agent is trained using MADDPG [37] with an Adam optimizer [29] with a learning rate of 1e-4 and batch size of 256 for 1.5M steps. For POINTNAV, we use the architecture from [54] with ResNet-50 as visual encoder, two 512-unit LSTM layers, and a 512-unit policy. We use the Adam optimizer with a learning rate of 2.5e-4. For OBJECTNAV, we use a ResNet-18 as visual encoder, two 512-unit LSTM layers, and a 512-unit policy. We use the Adam optimizer with a learning rate of 2.5e-4. For POINTNAV and OBJECTNAV, we train for 50M and 100M steps respectively. For all tasks, the discount factor is 0.99 and GAE [57] is used with =0.95. |