Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Posterior Sampling for Deep Reinforcement Learning
Authors: Remo Sasso, Michelangelo Conserva, Paulo Rauber
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Atari benchmark show that PSDRL significantly outperforms previous state-of-the-art attempts at scaling up posterior sampling while being competitive with a state-of-the-art (model-based) reinforcement learning method, both in sample efficiency and computational efficiency. |
| Researcher Affiliation | Academia | 1School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom. Correspondence to: Remo Sasso <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 summarizes PSDRL (forward models are sampled every m time steps instead of episodically). |
| Open Source Code | Yes | The source code for replicating all experiments is available as supplementary material. ... The implementation for PSDRL can be found at https://github.com/remosasso/PSDRL. |
| Open Datasets | Yes | We provide an experimental comparison between PSDRL and other algorithms on 55 Atari 2600 games that are commonly used in the literature (Mnih et al., 2015). |
| Dataset Splits | No | The paper does not explicitly provide specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits in the typical supervised learning sense. It describes evaluation episodes and environment steps. |
| Hardware Specification | Yes | We make use of an NVIDIA A100 GPU for training. |
| Software Dependencies | No | The paper mentions training with the 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific version numbers for software libraries, programming languages, or other tools used (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | Table 3 presents the hyperparameters for PSDRL, the search sets used for grid search, and the resulting values used for the experiments. |