Tightening Exploration in Upper Confidence Reinforcement Learning

Authors: Hippolyte Bourel, Odalric Maillard, Mohammad Sadegh Talebi

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate, through numerical experiments in standard environments, that reducing exploration this way yields a substantial numerical improvement compared to UCRL2 and its variants. In this section we provide illustrative numerical experiments that show the benefit of UCRL3 over UCRL2 and some of its popular variants.
Researcher Affiliation Academia 1Seque L, Inria Lille Nord Europe, Villeneuve d Ascq, France 2Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
Pseudocode Yes Algorithm 1 Extended Value Iteration (EVI), Algorithm 2 NOSS(f, b S, C, κ), Algorithm 3 EVI-NOSS(p, c, C, Nmax, ε), Algorithm 4 UCRL3 with input parameter δ (0, 1)
Open Source Code Yes The full code and implementation details are made available to the community (see Appendix E for details). Our implementation of UCRL3 as well as all baseline algorithms (UCRL2, KL-UCRL, UCRL2B, PSRL) can be found at: https://github.com/hbourel/UCRL3.
Open Datasets Yes In the first set of experiments, we consider the S-state River Swim environment (corresponding to the MDP shown in Figure 4). We consider two frozen lake environments of respective sizes of 7x7 and 9x11 as shown in Figure 5.
Dataset Splits No The paper describes experiments in reinforcement learning environments (e.g., River Swim, Frozen Lake) where learning occurs through continuous interaction, rather than using fixed train/validation/test dataset splits. No explicit split percentages or sample counts are provided.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper states that the code is available on GitHub but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup Yes For all algorithms, we set δ = 0.05 and use the same tie-breaking rule.