reproducibilityindex.ai

The Value Function Polytope in Reinforcement Learning

Authors: Robert Dadashi, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms. Our experiments use the two-state, two-action MDP depicted elsewhere in this paper (details in Appendix A).
Researcher Affiliation	Collaboration	1Google Brain 2Mila, Universit e de Montr eal 3Department of Computing Science, University of Alberta.
Pseudocode	No	The paper describes algorithms mathematically and textually, but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	No explicit statement or link to open-source code is provided.
Open Datasets	No	The paper uses custom-defined Markov Decision Processes (MDPs) for experiments, detailed in Appendix A, rather than publicly available datasets with access information.
Dataset Splits	No	The paper describes the MDP setup for analysis, but does not provide specific training, validation, or test dataset splits in terms of percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For the cross-entropy method (CEM), the paper states: 'We use N = 500, K = 50, an initial covariance of 0.1I, where I is the identity matrix of size 2, and a constant noise of 0.05I.'