Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Regret Minimization in MDPs with Options without Prior Knowledge

Authors: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also report preliminary empirical results supporting the theoretical ﬁndings. ... In this section we compare the regret of FSUCRL to SUCRL and UCRL to empirically verify the impact of removing prior knowledge about options and estimating their structure through the irreducible MC transformation. ... Figure 3: (Left) Regret after 1.2 108 steps normalized w.r.t. UCRL for different option durations in a 20x20 grid-world. (Right) Evolution of the regret as Tn increases for a 14x14 four-rooms maze.
Researcher Affiliation	Collaboration	Ronan Fruit Sequel Team Inria Lille EMAIL Matteo Pirotta Sequel Team Inria Lille EMAIL Alessandro Lazaric Sequel Team Inria Lille EMAIL Emma Brunskill Stanford University EMAIL
Pseudocode	Yes	Figure 2: The general structure of FSUCRL. Input: Conﬁdence δ ]0, 1[, rmax, S, A, O For episodes k = 1, 2, ... do ...
Open Source Code	No	The paper does not provide a link to open-source code or state that the code for the described methodology is available.
Open Datasets	No	The paper mentions "the toy domain presented in [14]" and "the classical 4-rooms maze [1]" but does not provide concrete access information (link, DOI, full author/year citation within the text, or specific repository) for these datasets.
Dataset Splits	No	The paper does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper describes the general experimental settings (e.g., using Hoeffding confidence bounds) but does not provide concrete hyperparameter values or detailed training configurations.