The Value Function Polytope in Reinforcement Learning

Authors: Robert Dadashi, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms. Our experiments use the two-state, two-action MDP depicted elsewhere in this paper (details in Appendix A).
Researcher Affiliation Collaboration 1Google Brain 2Mila, Universit e de Montr eal 3Department of Computing Science, University of Alberta.
Pseudocode No The paper describes algorithms mathematically and textually, but does not include structured pseudocode or algorithm blocks.
Open Source Code No No explicit statement or link to open-source code is provided.
Open Datasets No The paper uses custom-defined Markov Decision Processes (MDPs) for experiments, detailed in Appendix A, rather than publicly available datasets with access information.
Dataset Splits No The paper describes the MDP setup for analysis, but does not provide specific training, validation, or test dataset splits in terms of percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For the cross-entropy method (CEM), the paper states: 'We use N = 500, K = 50, an initial covariance of 0.1I, where I is the identity matrix of size 2, and a constant noise of 0.05I.'