Hierarchical Imitation and Reinforcement Learning
Authors: Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue, Hal Daumé III
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of our algorithms on two separate domains: (i) a simple but challenging maze navigation domain and (ii) the Atari game Montezuma s Revenge. |
| Researcher Affiliation | Collaboration | 1California Institute of Technology, Pasadena, CA 2Microsoft Research, New York, NY 3University of Maryland, College Park, MD. |
| Pseudocode | Yes | Algorithm 1 Hierarchical Behavioral Cloning (h-BC), Algorithm 2 Hierarchically Guided DAgger (hg-DAgger), Algorithm 3 Hierarchically Guided DAgger / Q-learning (hg-DAgger/Q) |
| Open Source Code | Yes | Code and experimental setups are available at https://sites.google.com/view/hierarchical-il-rl |
| Open Datasets | No | The paper mentions 'Maze Navigation Domain' and 'Montezuma s Revenge' as experimental domains, but does not provide specific links, DOIs, repositories, or formal citations with author/year for accessing the datasets or their specific configurations for reproducibility. |
| Dataset Splits | No | The paper discusses training and testing, but does not provide specific details on train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., CPU/GPU models, memory) used for running the experiments. It only mentions 'neural network architectures' in general terms. |
| Software Dependencies | No | The paper mentions algorithms like 'DDQN' and 'Q-learning' but does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) required for reproducibility. |
| Experiment Setup | Yes | The subpolicies and meta-controller use similar neural network architectures and only differ in the number of action outputs. (Details of network architecture are provided in Appendix B.) (Section 6.1) and input Annealed exploration probabilities ϵg > 0, g G (Algorithm 3, Section 5). |