reproducibilityindex.ai

Tightening Exploration in Upper Confidence Reinforcement Learning

Authors: Hippolyte Bourel, Odalric Maillard, Mohammad Sadegh Talebi

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate, through numerical experiments in standard environments, that reducing exploration this way yields a substantial numerical improvement compared to UCRL2 and its variants. In this section we provide illustrative numerical experiments that show the beneﬁt of UCRL3 over UCRL2 and some of its popular variants.
Researcher Affiliation	Academia	1Seque L, Inria Lille Nord Europe, Villeneuve d Ascq, France 2Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
Pseudocode	Yes	Algorithm 1 Extended Value Iteration (EVI), Algorithm 2 NOSS(f, b S, C, κ), Algorithm 3 EVI-NOSS(p, c, C, Nmax, ε), Algorithm 4 UCRL3 with input parameter δ (0, 1)
Open Source Code	Yes	The full code and implementation details are made available to the community (see Appendix E for details). Our implementation of UCRL3 as well as all baseline algorithms (UCRL2, KL-UCRL, UCRL2B, PSRL) can be found at: https://github.com/hbourel/UCRL3.
Open Datasets	Yes	In the ﬁrst set of experiments, we consider the S-state River Swim environment (corresponding to the MDP shown in Figure 4). We consider two frozen lake environments of respective sizes of 7x7 and 9x11 as shown in Figure 5.
Dataset Splits	No	The paper describes experiments in reinforcement learning environments (e.g., River Swim, Frozen Lake) where learning occurs through continuous interaction, rather than using fixed train/validation/test dataset splits. No explicit split percentages or sample counts are provided.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper states that the code is available on GitHub but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup	Yes	For all algorithms, we set δ = 0.05 and use the same tie-breaking rule.