Disagreement-Regularized Imitation Learning

Authors: Kiante Brantley, Wen Sun, Mikael Henaff

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm empirically across multiple pixel-based Atari environments and continuous control tasks, and show that it matches or significantly outperforms behavioral cloning and generative adversarial imitation learning.
Researcher Affiliation Collaboration Kiant e Brantley University of Maryland kdbrant@cs.umd.edu Wen Sun Microsoft Research sun.wen@microsoft.com Mikael Henaff Microsoft Research mihenaff@microsoft.com
Pseudocode Yes Algorithm 1 Disagreement-Regularized Imitation Learning (DRIL)
Open Source Code No The paper mentions using open-source repositories for baselines (stable baselines repository, Kostrikov's PyTorch implementations) but does not provide a link or statement about the authors' own implementation being open-sourced.
Open Datasets No The paper states that expert trajectories were generated from pretrained PPO agents from the stable baselines repository, and that experiments were run on PyBullet and OpenAI Gym environments. However, it does not provide concrete access (link, DOI, specific citation to the generated dataset) to the specific expert demonstration data (trajectories) used in their experiments. The environments themselves are public, but the generated data is not explicitly made available.
Dataset Splits No The paper mentions "We stopped training once the validation error did not improve for 20 epochs" but does not specify the dataset split percentages or counts for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper mentions using Adam for optimization and refers to a 'Pytorch implementations of reinforcement learning algorithms' repository, but it does not specify exact version numbers for Python, PyTorch, CUDA, or other key software libraries used.
Experiment Setup Yes Table 1: Hyperparameters for DRIL; Table 2: Hyperparameters for GAIL; Table 3: Hyperparameters (our method). These tables provide specific hyperparameter values like learning rate, quantile cutoff, number of supervised updates, entropy coefficient, value loss coefficient, number of steps, and parallel environments.