reproducibilityindex.ai

Imitation Learning via Off-Policy Distribution Matching

Authors: Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Value DICE on a suite of popular imitation learning benchmarks, ﬁnding that it can achieve state-of-the-art sample efﬁciency and performance.1...We evaluate Value DICE in a variety of settings, starting with a simple synthetic task before continuing to an evaluation on a suite of Mu Jo Co benchmarks.
Researcher Affiliation	Collaboration	Ilya Kostrikov , Oﬁr Nachum, Jonathan Tompson Google Research {kostrikov, ofirnachum, tompson}@google.com...Also at NYU.
Pseudocode	Yes	Please see the appendix for a full pseudocode implementation of Value DICE.
Open Source Code	Yes	Code to reproduce our results is available at https://github.com/google-research/ google-research/tree/master/value_dice.
Open Datasets	Yes	We evaluate the algorithms on the standard Mu Jo Co environments using expert demonstrations from Ho & Ermon (2016).
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits, but rather describes using expert demonstrations for learning and evaluating policies in a simulated environment.
Hardware Specification	No	The paper mentions 'networks with an MLP architecture' but provides no specific details about the hardware (e.g., CPU, GPU models, memory) used for experiments.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' and specific regularization techniques, but it does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	All algorithms use networks with an MLP architecture with 2 hidden layers and 256 hidden units. For discriminators, critic, ν we use Adam optimizer with learning rate 10 3 while for the actors we use the learning rate of 10 5. For the discriminator and ν networks we use gradient penalties from Gulrajani et al. (2017). We also regularize the actor network with the orthogonal regularization (Brock et al., 2018) with a coefﬁcient 10 4. Also we perform 4 updates per 1 environment step. We handle absorbing states of the environments similarly to Kostrikov et al. (2019).