Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning
Authors: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we report numerical simulations supporting our theoretical findings and showing how SCAL significantly outperforms UCRL in MDPs with large diameter and small span. |
| Researcher Affiliation | Collaboration | 1Seque L Team, INRIA Lille, France 2Facebook AI Research, Paris, France 3Montanuniversit at Leoben, Austria. |
| Pseudocode | Yes | Figure 1. The general structure of optimistic algorithms for RL. and Figure 3. Algorithm SCOPT. |
| Open Source Code | Yes | The code is available on Git Hub. |
| Open Datasets | No | The paper uses a 'simple but descriptive three-state domain' and specifies reward distributions (Bernoulli) but does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions that 'The code is available on Git Hub.' but does not list specific ancillary software components with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | In all the experiments, we noticed that perturbing the extended MDP was not necessary to ensure convergence of SCOPT and so we set ηk = 0. We also set γk = 0 to speed-up the execution of SCOPT (see stopping condition in Fig. 3). |