Bridging the Imitation Gap by Adaptive Insubordination

Authors: Luca Weihs, Unnat Jain, Iou-Jen Liu, Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously compare ADVISOR to IL methods, RL methods, and popularly-adopted (but often ad hoc) IL & RL combinations. In particular, we evaluate 15 learning methods. We do this over thirteen tasks realizations of Ex. 1 & Ex. 2, eight tasks of varying complexity within the fast, versatile MINIGRID environment [8, 9], Cooperative Navigation (COOPNAV) with reduced visible range in the multi-agent particle environment (MPE) [43, 37], Point Goal navigation (POINTNAV) using the Gibson dataset in AIHABITAT [71, 54], and Object Goal Navigation (OBJECTNAV) in ROBOTHOR [14].
Researcher Affiliation Collaboration 1Allen Institute for AI, 2University of Illinois at Urbana-Champaign {lucaw, jordis, anik}@allenai.org {uj2, iliu3, slazebni, aschwing}@illinois.edu
Pseudocode Yes In practice we train f( ; ) and aux f ( ; ) jointly using stochastic gradient descent, as summarized in Alg. A.1.
Open Source Code Yes All code to reproduce our experiments will be made public under the Apache 2.0 license.5 See https://unnat.github.io/advisor/ for an up-to-date link to this code.
Open Datasets Yes We study the benefits of ADVISOR on thirteen tasks, including POISONEDDOORS from Ex. 1, a 2D lighthouse gridworld, a suite of tasks set within the MINIGRID environment [8, 9], Cooperative Navigation with limited range (COOPNAV) in the multi-agent particle environment (MPE) [43, 38], and two navigational tasks set in 3D, high visual fidelity, simulators of real-world living environments (POINTNAV in AIHABITAT [54] and OBJECTNAV in ROBOTHOR [31, 14]).
Dataset Splits No The paper consistently refers to 'validation set performance' and 'validation reward', indicating the use of a validation set, but it does not provide specific details on how this validation set was split or generated from the experimental environment for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions algorithms like PPO and MADDPG, and cites a PyTorch implementation repository, but it does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes For all MINIGRID tasks, we use a 3-layer MLP policy with 64 units per layer and ReLU activations. The agent is trained using PPO [58] with an Adam optimizer [29] with a learning rate of 5e-4 and batch size of 256 for 1M steps. For the COOPNAV tasks, we use an MLP with two 64-unit layers and ReLU activations. The agent is trained using MADDPG [37] with an Adam optimizer [29] with a learning rate of 1e-4 and batch size of 256 for 1.5M steps. For POINTNAV, we use the architecture from [54] with ResNet-50 as visual encoder, two 512-unit LSTM layers, and a 512-unit policy. We use the Adam optimizer with a learning rate of 2.5e-4. For OBJECTNAV, we use a ResNet-18 as visual encoder, two 512-unit LSTM layers, and a 512-unit policy. We use the Adam optimizer with a learning rate of 2.5e-4. For POINTNAV and OBJECTNAV, we train for 50M and 100M steps respectively. For all tasks, the discount factor is 0.99 and GAE [57] is used with =0.95.