reproducibilityindex.ai

Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters

Authors: Alberto Maria Metelli, Amarildo Likmeta, Marcello Restelli

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present an experimental campaign to show the effectiveness of WQL on ﬁnite problems, compared to several RL algorithms, some of which are speciﬁcally designed for exploration, along with some preliminary results on Atari games.
Researcher Affiliation	Academia	Alberto Maria Metelli DEIB Politecnico di Milano Milan, Italy albertomaria.metelli@polimi.it Amarildo Likmeta DEIB Politecnico di Milano Milan, Italy amarildo.likmeta@polimi.it Marcello Restelli DEIB Politecnico di Milano Milan, Italy marcello.restelli@polimi.it
Pseudocode	Yes	Algorithm 1: Wasserstein Q-Learning. Algorithm 2: Particle DQN.
Open Source Code	Yes	The implementation of the proposed algorithms can be found at https://github.com/albertometelli/wql.
Open Datasets	Yes	We evaluate WQL on a set of RL tasks designed to emphasize exploration: the Taxi problem [15], the Chain [15], the River Swim [42], and the Six Arms [42]... Finally, we provide some preliminary results on the application of WQL to deep architectures (Section 7.2).
Dataset Splits	No	The paper uses standard RL environments (e.g., Taxi, Atari games) and discusses online/offline returns, but it does not specify explicit numerical training/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction in the provided text.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	The paper mentions hyperparameters like "step size schedule (αt)" and "particle initialization interval" and states that "implementation details are reported in Appendix C", but it does not provide concrete numerical values for these hyperparameters or detailed training configurations in the main text provided.