Posterior Sampling for Deep Reinforcement Learning
Authors: Remo Sasso, Michelangelo Conserva, Paulo Rauber
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Atari benchmark show that PSDRL significantly outperforms previous state-of-the-art attempts at scaling up posterior sampling while being competitive with a state-of-the-art (model-based) reinforcement learning method, both in sample efficiency and computational efficiency. |
| Researcher Affiliation | Academia | 1School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom. Correspondence to: Remo Sasso <r.sasso@qmul.ac.uk>. |
| Pseudocode | Yes | Algorithm 1 summarizes PSDRL (forward models are sampled every m time steps instead of episodically). |
| Open Source Code | Yes | The source code for replicating all experiments is available as supplementary material. ... The implementation for PSDRL can be found at https://github.com/remosasso/PSDRL. |
| Open Datasets | Yes | We provide an experimental comparison between PSDRL and other algorithms on 55 Atari 2600 games that are commonly used in the literature (Mnih et al., 2015). |
| Dataset Splits | No | The paper does not explicitly provide specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits in the typical supervised learning sense. It describes evaluation episodes and environment steps. |
| Hardware Specification | Yes | We make use of an NVIDIA A100 GPU for training. |
| Software Dependencies | No | The paper mentions training with the 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific version numbers for software libraries, programming languages, or other tools used (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | Table 3 presents the hyperparameters for PSDRL, the search sets used for grid search, and the resulting values used for the experiments. |