reproducibilityindex.ai

Lookahead-Bounded Q-learning

Authors: Ibrahim El Shar, Daniel Jiang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.
Researcher Affiliation	Academia	1Department of Industrial Engineering, University of Pittsburgh, PA, USA. Correspondence to: Ibrahim El Shar <ije8@pitt.edu>.
Pseudocode	Yes	Algorithm 1 Lookahead-Bounded Q-Learning
Open Source Code	Yes	We also open-source a Python package3 for LBQL that reproduces all experiments and figures presented in this paper. 3https://github.com/ibrahim-elshar/LBQL ICML2020.
Open Datasets	No	The paper refers to environments like 'Windy Gridworld' and 'Stormy Gridworld', stating that WG is a 'well-known variant' and SG is a 'new domain'. It also refers to 'synthetic problem[s]' for car-sharing. While these are problem setups, the paper does not provide concrete access information (link, DOI, or specific citation to a public data repository) for pre-existing public datasets used in training.
Dataset Splits	No	The paper describes numerical experiments and evaluations but does not explicitly provide specific dataset split information (e.g., percentages, sample counts, or references to predefined splits) for training, validation, or testing. In reinforcement learning, data is often generated through interaction rather than pre-split datasets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running the experiments, such as GPU/CPU models, memory, or cloud computing instance types.
Software Dependencies	No	The paper mentions providing a 'Python package' for LBQL, but it does not list specific version numbers for Python itself or any relevant software libraries or dependencies (e.g., PyTorch, TensorFlow, NumPy).
Experiment Setup	Yes	Detailed description of the environments, the parameters used for the five algorithms, and sensitivity analysis are deferred to Appendix D.