Regret Minimization in MDPs with Options without Prior Knowledge
Authors: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also report preliminary empirical results supporting the theoretical findings. ... In this section we compare the regret of FSUCRL to SUCRL and UCRL to empirically verify the impact of removing prior knowledge about options and estimating their structure through the irreducible MC transformation. ... Figure 3: (Left) Regret after 1.2 108 steps normalized w.r.t. UCRL for different option durations in a 20x20 grid-world. (Right) Evolution of the regret as Tn increases for a 14x14 four-rooms maze. |
| Researcher Affiliation | Collaboration | Ronan Fruit Sequel Team Inria Lille ronan.fruit@inria.fr Matteo Pirotta Sequel Team Inria Lille matteo.pirotta@inria.fr Alessandro Lazaric Sequel Team Inria Lille alessandro.lazaric@inria.fr Emma Brunskill Stanford University ebrun@cs.stanford.edu |
| Pseudocode | Yes | Figure 2: The general structure of FSUCRL. Input: Confidence δ ]0, 1[, rmax, S, A, O For episodes k = 1, 2, ... do ... |
| Open Source Code | No | The paper does not provide a link to open-source code or state that the code for the described methodology is available. |
| Open Datasets | No | The paper mentions "the toy domain presented in [14]" and "the classical 4-rooms maze [1]" but does not provide concrete access information (link, DOI, full author/year citation within the text, or specific repository) for these datasets. |
| Dataset Splits | No | The paper does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes the general experimental settings (e.g., using Hoeffding confidence bounds) but does not provide concrete hyperparameter values or detailed training configurations. |