reproducibilityindex.ai

Model-free Posterior Sampling via Learning Rate Randomization

Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Ménard

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical study shows that Rand QL outperforms existing approaches on baseline exploration environments.5 Experiments In this section we present the experiments we conducted for tabular environments using rlberry library [Domingues et al., 2021a]. We also provide experiments in non-tabular environment in Appendix I.
Researcher Affiliation	Collaboration	1CMAP, École Polytechnique 2HSE University 3Duisburg-Essen University 4Google Deep Mind 5Mohamed Bin Zayed University of AI, UAE 6IDEMIA 7ENS Lyon
Pseudocode	Yes	Algorithm 1 Tabular Staged-Rand QL
Open Source Code	No	In this section we present the experiments we conducted for tabular environments using rlberry library [Domingues et al., 2021a].
Open Datasets	No	Environment We use a grid-world environment with 100 states (i, j) [10] [10] and 4 actions (left, right, up and down)... The second one is a chain environment described by Osband et al. [2016] with L = 15 states and 2 actions (left or right)... We use a ball environment with the 2-dimensional unit Euclidean ball as state-space S = {s R2, s 2 1} and of horizon H = 30.
Dataset Splits	No	The paper refers to training an agent in an environment but does not provide specific dataset split information (e.g., percentages, sample counts for train/validation/test sets, or citations to predefined splits).
Hardware Specification	Yes	For all experiments we used 2 CPUs (Intel Xeon CPU 2.20GHz), and no GPU was used.
Software Dependencies	No	In this section we present the experiments we conducted for tabular environments using rlberry library [Domingues et al., 2021a].
Experiment Setup	Yes	For these algorithms we used the same parameters: posterior inflation κ = 1.0, n0 = 1/S prior sample (same as PSRL, see below), ensemble size J = 10. For DQN and Boot DQN we use as netwrok a 2-layer multilayer perceptron (MLP) with hidden layer size equals to 64.