The Value Function Polytope in Reinforcement Learning
Authors: Robert Dadashi, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms. Our experiments use the two-state, two-action MDP depicted elsewhere in this paper (details in Appendix A). |
| Researcher Affiliation | Collaboration | 1Google Brain 2Mila, Universit e de Montr eal 3Department of Computing Science, University of Alberta. |
| Pseudocode | No | The paper describes algorithms mathematically and textually, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | No explicit statement or link to open-source code is provided. |
| Open Datasets | No | The paper uses custom-defined Markov Decision Processes (MDPs) for experiments, detailed in Appendix A, rather than publicly available datasets with access information. |
| Dataset Splits | No | The paper describes the MDP setup for analysis, but does not provide specific training, validation, or test dataset splits in terms of percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | For the cross-entropy method (CEM), the paper states: 'We use N = 500, K = 50, an initial covariance of 0.1I, where I is the identity matrix of size 2, and a constant noise of 0.05I.' |