reproducibilityindex.ai

Bridging the Imitation Gap by Adaptive Insubordination

Authors: Luca Weihs, Unnat Jain, Iou-Jen Liu, Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We rigorously compare ADVISOR to IL methods, RL methods, and popularly-adopted (but often ad hoc) IL & RL combinations. In particular, we evaluate 15 learning methods. We do this over thirteen tasks realizations of Ex. 1 & Ex. 2, eight tasks of varying complexity within the fast, versatile MINIGRID environment [8, 9], Cooperative Navigation (COOPNAV) with reduced visible range in the multi-agent particle environment (MPE) [43, 37], Point Goal navigation (POINTNAV) using the Gibson dataset in AIHABITAT [71, 54], and Object Goal Navigation (OBJECTNAV) in ROBOTHOR [14].
Researcher Affiliation	Collaboration	1Allen Institute for AI, 2University of Illinois at Urbana-Champaign {lucaw, jordis, anik}@allenai.org {uj2, iliu3, slazebni, aschwing}@illinois.edu
Pseudocode	Yes	In practice we train f( ; ) and aux f ( ; ) jointly using stochastic gradient descent, as summarized in Alg. A.1.
Open Source Code	Yes	All code to reproduce our experiments will be made public under the Apache 2.0 license.5 See https://unnat.github.io/advisor/ for an up-to-date link to this code.
Open Datasets	Yes	We study the beneﬁts of ADVISOR on thirteen tasks, including POISONEDDOORS from Ex. 1, a 2D lighthouse gridworld, a suite of tasks set within the MINIGRID environment [8, 9], Cooperative Navigation with limited range (COOPNAV) in the multi-agent particle environment (MPE) [43, 38], and two navigational tasks set in 3D, high visual ﬁdelity, simulators of real-world living environments (POINTNAV in AIHABITAT [54] and OBJECTNAV in ROBOTHOR [31, 14]).
Dataset Splits	No	The paper consistently refers to 'validation set performance' and 'validation reward', indicating the use of a validation set, but it does not provide specific details on how this validation set was split or generated from the experimental environment for reproducibility.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms like PPO and MADDPG, and cites a PyTorch implementation repository, but it does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For all MINIGRID tasks, we use a 3-layer MLP policy with 64 units per layer and ReLU activations. The agent is trained using PPO [58] with an Adam optimizer [29] with a learning rate of 5e-4 and batch size of 256 for 1M steps. For the COOPNAV tasks, we use an MLP with two 64-unit layers and ReLU activations. The agent is trained using MADDPG [37] with an Adam optimizer [29] with a learning rate of 1e-4 and batch size of 256 for 1.5M steps. For POINTNAV, we use the architecture from [54] with ResNet-50 as visual encoder, two 512-unit LSTM layers, and a 512-unit policy. We use the Adam optimizer with a learning rate of 2.5e-4. For OBJECTNAV, we use a ResNet-18 as visual encoder, two 512-unit LSTM layers, and a 512-unit policy. We use the Adam optimizer with a learning rate of 2.5e-4. For POINTNAV and OBJECTNAV, we train for 50M and 100M steps respectively. For all tasks, the discount factor is 0.99 and GAE [57] is used with =0.95.