reproducibilityindex.ai

Hierarchical Imitation and Reinforcement Learning

Authors: Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue, Hal Daumé III

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of our algorithms on two separate domains: (i) a simple but challenging maze navigation domain and (ii) the Atari game Montezuma s Revenge.
Researcher Affiliation	Collaboration	1California Institute of Technology, Pasadena, CA 2Microsoft Research, New York, NY 3University of Maryland, College Park, MD.
Pseudocode	Yes	Algorithm 1 Hierarchical Behavioral Cloning (h-BC), Algorithm 2 Hierarchically Guided DAgger (hg-DAgger), Algorithm 3 Hierarchically Guided DAgger / Q-learning (hg-DAgger/Q)
Open Source Code	Yes	Code and experimental setups are available at https://sites.google.com/view/hierarchical-il-rl
Open Datasets	No	The paper mentions 'Maze Navigation Domain' and 'Montezuma s Revenge' as experimental domains, but does not provide specific links, DOIs, repositories, or formal citations with author/year for accessing the datasets or their specific configurations for reproducibility.
Dataset Splits	No	The paper discusses training and testing, but does not provide specific details on train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., CPU/GPU models, memory) used for running the experiments. It only mentions 'neural network architectures' in general terms.
Software Dependencies	No	The paper mentions algorithms like 'DDQN' and 'Q-learning' but does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) required for reproducibility.
Experiment Setup	Yes	The subpolicies and meta-controller use similar neural network architectures and only differ in the number of action outputs. (Details of network architecture are provided in Appendix B.) (Section 6.1) and input Annealed exploration probabilities ϵg > 0, g G (Algorithm 3, Section 5).