Disagreement-Regularized Imitation Learning
Authors: Kiante Brantley, Wen Sun, Mikael Henaff
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm empirically across multiple pixel-based Atari environments and continuous control tasks, and show that it matches or significantly outperforms behavioral cloning and generative adversarial imitation learning. |
| Researcher Affiliation | Collaboration | Kiant e Brantley University of Maryland kdbrant@cs.umd.edu Wen Sun Microsoft Research sun.wen@microsoft.com Mikael Henaff Microsoft Research mihenaff@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Disagreement-Regularized Imitation Learning (DRIL) |
| Open Source Code | No | The paper mentions using open-source repositories for baselines (stable baselines repository, Kostrikov's PyTorch implementations) but does not provide a link or statement about the authors' own implementation being open-sourced. |
| Open Datasets | No | The paper states that expert trajectories were generated from pretrained PPO agents from the stable baselines repository, and that experiments were run on PyBullet and OpenAI Gym environments. However, it does not provide concrete access (link, DOI, specific citation to the generated dataset) to the specific expert demonstration data (trajectories) used in their experiments. The environments themselves are public, but the generated data is not explicitly made available. |
| Dataset Splits | No | The paper mentions "We stopped training once the validation error did not improve for 20 epochs" but does not specify the dataset split percentages or counts for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using Adam for optimization and refers to a 'Pytorch implementations of reinforcement learning algorithms' repository, but it does not specify exact version numbers for Python, PyTorch, CUDA, or other key software libraries used. |
| Experiment Setup | Yes | Table 1: Hyperparameters for DRIL; Table 2: Hyperparameters for GAIL; Table 3: Hyperparameters (our method). These tables provide specific hyperparameter values like learning rate, quantile cutoff, number of supervised updates, entropy coefficient, value loss coefficient, number of steps, and parallel environments. |