Posterior Sampling for Deep Reinforcement Learning

Authors: Remo Sasso, Michelangelo Conserva, Paulo Rauber

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Atari benchmark show that PSDRL significantly outperforms previous state-of-the-art attempts at scaling up posterior sampling while being competitive with a state-of-the-art (model-based) reinforcement learning method, both in sample efficiency and computational efficiency.
Researcher Affiliation Academia 1School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom. Correspondence to: Remo Sasso <r.sasso@qmul.ac.uk>.
Pseudocode Yes Algorithm 1 summarizes PSDRL (forward models are sampled every m time steps instead of episodically).
Open Source Code Yes The source code for replicating all experiments is available as supplementary material. ... The implementation for PSDRL can be found at https://github.com/remosasso/PSDRL.
Open Datasets Yes We provide an experimental comparison between PSDRL and other algorithms on 55 Atari 2600 games that are commonly used in the literature (Mnih et al., 2015).
Dataset Splits No The paper does not explicitly provide specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits in the typical supervised learning sense. It describes evaluation episodes and environment steps.
Hardware Specification Yes We make use of an NVIDIA A100 GPU for training.
Software Dependencies No The paper mentions training with the 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific version numbers for software libraries, programming languages, or other tools used (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes Table 3 presents the hyperparameters for PSDRL, the search sets used for grid search, and the resulting values used for the experiments.