Lookahead-Bounded Q-learning
Authors: Ibrahim El Shar, Daniel Jiang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions. |
| Researcher Affiliation | Academia | 1Department of Industrial Engineering, University of Pittsburgh, PA, USA. Correspondence to: Ibrahim El Shar <ije8@pitt.edu>. |
| Pseudocode | Yes | Algorithm 1 Lookahead-Bounded Q-Learning |
| Open Source Code | Yes | We also open-source a Python package3 for LBQL that reproduces all experiments and figures presented in this paper. 3https://github.com/ibrahim-elshar/LBQL ICML2020. |
| Open Datasets | No | The paper refers to environments like 'Windy Gridworld' and 'Stormy Gridworld', stating that WG is a 'well-known variant' and SG is a 'new domain'. It also refers to 'synthetic problem[s]' for car-sharing. While these are problem setups, the paper does not provide concrete access information (link, DOI, or specific citation to a public data repository) for pre-existing public datasets used in training. |
| Dataset Splits | No | The paper describes numerical experiments and evaluations but does not explicitly provide specific dataset split information (e.g., percentages, sample counts, or references to predefined splits) for training, validation, or testing. In reinforcement learning, data is often generated through interaction rather than pre-split datasets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running the experiments, such as GPU/CPU models, memory, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions providing a 'Python package' for LBQL, but it does not list specific version numbers for Python itself or any relevant software libraries or dependencies (e.g., PyTorch, TensorFlow, NumPy). |
| Experiment Setup | Yes | Detailed description of the environments, the parameters used for the five algorithms, and sensitivity analysis are deferred to Appendix D. |