Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters

Authors: Alberto Maria Metelli, Amarildo Likmeta, Marcello Restelli

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present an experimental campaign to show the effectiveness of WQL on finite problems, compared to several RL algorithms, some of which are specifically designed for exploration, along with some preliminary results on Atari games.
Researcher Affiliation Academia Alberto Maria Metelli DEIB Politecnico di Milano Milan, Italy albertomaria.metelli@polimi.it Amarildo Likmeta DEIB Politecnico di Milano Milan, Italy amarildo.likmeta@polimi.it Marcello Restelli DEIB Politecnico di Milano Milan, Italy marcello.restelli@polimi.it
Pseudocode Yes Algorithm 1: Wasserstein Q-Learning. Algorithm 2: Particle DQN.
Open Source Code Yes The implementation of the proposed algorithms can be found at https://github.com/albertometelli/wql.
Open Datasets Yes We evaluate WQL on a set of RL tasks designed to emphasize exploration: the Taxi problem [15], the Chain [15], the River Swim [42], and the Six Arms [42]... Finally, we provide some preliminary results on the application of WQL to deep architectures (Section 7.2).
Dataset Splits No The paper uses standard RL environments (e.g., Taxi, Atari games) and discusses online/offline returns, but it does not specify explicit numerical training/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction in the provided text.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper mentions hyperparameters like "step size schedule (αt)" and "particle initialization interval" and states that "implementation details are reported in Appendix C", but it does not provide concrete numerical values for these hyperparameters or detailed training configurations in the main text provided.