Tightening Exploration in Upper Confidence Reinforcement Learning
Authors: Hippolyte Bourel, Odalric Maillard, Mohammad Sadegh Talebi
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate, through numerical experiments in standard environments, that reducing exploration this way yields a substantial numerical improvement compared to UCRL2 and its variants. In this section we provide illustrative numerical experiments that show the benefit of UCRL3 over UCRL2 and some of its popular variants. |
| Researcher Affiliation | Academia | 1Seque L, Inria Lille Nord Europe, Villeneuve d Ascq, France 2Department of Computer Science, University of Copenhagen, Copenhagen, Denmark. |
| Pseudocode | Yes | Algorithm 1 Extended Value Iteration (EVI), Algorithm 2 NOSS(f, b S, C, κ), Algorithm 3 EVI-NOSS(p, c, C, Nmax, ε), Algorithm 4 UCRL3 with input parameter δ (0, 1) |
| Open Source Code | Yes | The full code and implementation details are made available to the community (see Appendix E for details). Our implementation of UCRL3 as well as all baseline algorithms (UCRL2, KL-UCRL, UCRL2B, PSRL) can be found at: https://github.com/hbourel/UCRL3. |
| Open Datasets | Yes | In the first set of experiments, we consider the S-state River Swim environment (corresponding to the MDP shown in Figure 4). We consider two frozen lake environments of respective sizes of 7x7 and 9x11 as shown in Figure 5. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments (e.g., River Swim, Frozen Lake) where learning occurs through continuous interaction, rather than using fixed train/validation/test dataset splits. No explicit split percentages or sample counts are provided. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states that the code is available on GitHub but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | For all algorithms, we set δ = 0.05 and use the same tie-breaking rule. |