reproducibilityindex.ai

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

Authors: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we report numerical simulations supporting our theoretical ﬁndings and showing how SCAL signiﬁcantly outperforms UCRL in MDPs with large diameter and small span.
Researcher Affiliation	Collaboration	1Seque L Team, INRIA Lille, France 2Facebook AI Research, Paris, France 3Montanuniversit at Leoben, Austria.
Pseudocode	Yes	Figure 1. The general structure of optimistic algorithms for RL. and Figure 3. Algorithm SCOPT.
Open Source Code	Yes	The code is available on Git Hub.
Open Datasets	No	The paper uses a 'simple but descriptive three-state domain' and specifies reward distributions (Bernoulli) but does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions that 'The code is available on Git Hub.' but does not list specific ancillary software components with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	In all the experiments, we noticed that perturbing the extended MDP was not necessary to ensure convergence of SCOPT and so we set ηk = 0. We also set γk = 0 to speed-up the execution of SCOPT (see stopping condition in Fig. 3).