Hierarchical Imitation and Reinforcement Learning

Authors: Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue, Hal Daumé III

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of our algorithms on two separate domains: (i) a simple but challenging maze navigation domain and (ii) the Atari game Montezuma s Revenge.
Researcher Affiliation Collaboration 1California Institute of Technology, Pasadena, CA 2Microsoft Research, New York, NY 3University of Maryland, College Park, MD.
Pseudocode Yes Algorithm 1 Hierarchical Behavioral Cloning (h-BC), Algorithm 2 Hierarchically Guided DAgger (hg-DAgger), Algorithm 3 Hierarchically Guided DAgger / Q-learning (hg-DAgger/Q)
Open Source Code Yes Code and experimental setups are available at https://sites.google.com/view/hierarchical-il-rl
Open Datasets No The paper mentions 'Maze Navigation Domain' and 'Montezuma s Revenge' as experimental domains, but does not provide specific links, DOIs, repositories, or formal citations with author/year for accessing the datasets or their specific configurations for reproducibility.
Dataset Splits No The paper discusses training and testing, but does not provide specific details on train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., CPU/GPU models, memory) used for running the experiments. It only mentions 'neural network architectures' in general terms.
Software Dependencies No The paper mentions algorithms like 'DDQN' and 'Q-learning' but does not list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) required for reproducibility.
Experiment Setup Yes The subpolicies and meta-controller use similar neural network architectures and only differ in the number of action outputs. (Details of network architecture are provided in Appendix B.) (Section 6.1) and input Annealed exploration probabilities ϵg > 0, g G (Algorithm 3, Section 5).