Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Model-free Posterior Sampling via Learning Rate Randomization
Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Ménard
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical study shows that Rand QL outperforms existing approaches on baseline exploration environments.5 Experiments In this section we present the experiments we conducted for tabular environments using rlberry library [Domingues et al., 2021a]. We also provide experiments in non-tabular environment in Appendix I. |
| Researcher Affiliation | Collaboration | 1CMAP, École Polytechnique 2HSE University 3Duisburg-Essen University 4Google Deep Mind 5Mohamed Bin Zayed University of AI, UAE 6IDEMIA 7ENS Lyon |
| Pseudocode | Yes | Algorithm 1 Tabular Staged-Rand QL |
| Open Source Code | No | In this section we present the experiments we conducted for tabular environments using rlberry library [Domingues et al., 2021a]. |
| Open Datasets | No | Environment We use a grid-world environment with 100 states (i, j) [10] [10] and 4 actions (left, right, up and down)... The second one is a chain environment described by Osband et al. [2016] with L = 15 states and 2 actions (left or right)... We use a ball environment with the 2-dimensional unit Euclidean ball as state-space S = {s R2, s 2 1} and of horizon H = 30. |
| Dataset Splits | No | The paper refers to training an agent in an environment but does not provide specific dataset split information (e.g., percentages, sample counts for train/validation/test sets, or citations to predefined splits). |
| Hardware Specification | Yes | For all experiments we used 2 CPUs (Intel Xeon CPU 2.20GHz), and no GPU was used. |
| Software Dependencies | No | In this section we present the experiments we conducted for tabular environments using rlberry library [Domingues et al., 2021a]. |
| Experiment Setup | Yes | For these algorithms we used the same parameters: posterior inflation κ = 1.0, n0 = 1/S prior sample (same as PSRL, see below), ensemble size J = 10. For DQN and Boot DQN we use as netwrok a 2-layer multilayer perceptron (MLP) with hidden layer size equals to 64. |