Lookahead-Bounded Q-learning

Authors: Ibrahim El Shar, Daniel Jiang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.
Researcher Affiliation Academia 1Department of Industrial Engineering, University of Pittsburgh, PA, USA. Correspondence to: Ibrahim El Shar <ije8@pitt.edu>.
Pseudocode Yes Algorithm 1 Lookahead-Bounded Q-Learning
Open Source Code Yes We also open-source a Python package3 for LBQL that reproduces all experiments and figures presented in this paper. 3https://github.com/ibrahim-elshar/LBQL ICML2020.
Open Datasets No The paper refers to environments like 'Windy Gridworld' and 'Stormy Gridworld', stating that WG is a 'well-known variant' and SG is a 'new domain'. It also refers to 'synthetic problem[s]' for car-sharing. While these are problem setups, the paper does not provide concrete access information (link, DOI, or specific citation to a public data repository) for pre-existing public datasets used in training.
Dataset Splits No The paper describes numerical experiments and evaluations but does not explicitly provide specific dataset split information (e.g., percentages, sample counts, or references to predefined splits) for training, validation, or testing. In reinforcement learning, data is often generated through interaction rather than pre-split datasets.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running the experiments, such as GPU/CPU models, memory, or cloud computing instance types.
Software Dependencies No The paper mentions providing a 'Python package' for LBQL, but it does not list specific version numbers for Python itself or any relevant software libraries or dependencies (e.g., PyTorch, TensorFlow, NumPy).
Experiment Setup Yes Detailed description of the environments, the parameters used for the five algorithms, and sensitivity analysis are deferred to Appendix D.