Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters

Authors: Alberto Maria Metelli, Amarildo Likmeta, Marcello Restelli

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present an experimental campaign to show the effectiveness of WQL on finite problems, compared to several RL algorithms, some of which are specifically designed for exploration, along with some preliminary results on Atari games.
Researcher Affiliation Academia Alberto Maria Metelli DEIB Politecnico di Milano Milan, Italy EMAIL Amarildo Likmeta DEIB Politecnico di Milano Milan, Italy EMAIL Marcello Restelli DEIB Politecnico di Milano Milan, Italy EMAIL
Pseudocode Yes Algorithm 1: Wasserstein Q-Learning. Algorithm 2: Particle DQN.
Open Source Code Yes The implementation of the proposed algorithms can be found at https://github.com/albertometelli/wql.
Open Datasets Yes We evaluate WQL on a set of RL tasks designed to emphasize exploration: the Taxi problem [15], the Chain [15], the River Swim [42], and the Six Arms [42]... Finally, we provide some preliminary results on the application of WQL to deep architectures (Section 7.2).
Dataset Splits No The paper uses standard RL environments (e.g., Taxi, Atari games) and discusses online/offline returns, but it does not specify explicit numerical training/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction in the provided text.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper mentions hyperparameters like "step size schedule (αt)" and "particle initialization interval" and states that "implementation details are reported in Appendix C", but it does not provide concrete numerical values for these hyperparameters or detailed training configurations in the main text provided.