Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Gambler's Problem and Beyond
Authors: Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We analytically investigate a deceptively simple problem, the Gambler s problem, introduced in the reinforcement learning textbook by Sutton & Barto (2018), on Example 4.3, Chapter 4, page 84. The problem setting is natural and simple enough that little discussion was given in the book apart from an algorithmic solution by value iteration. A close inspection will however show that the problem, as a representative of the entire family of Markov decision processes (MDP), involves a level of complexity and curiosity uncharted in years of reinforcement learning research. |
| Researcher Affiliation | Academia | Baoxiang Wang Department of Computer Science and Engineering The Chinese University of Hong Kong EMAIL Shuai Li John Hopcroft Center for Computer Science Shanghai Jiao Tong University EMAIL Jiajin Li Department of SEEM The Chinese University of Hong Kong EMAIL Siu On Chan Department of Computer Science and Engineering The Chinese University of Hong Kong EMAIL |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code for its own methodology. It references a third-party open-source implementation for illustrative plots, but not its own work. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical analysis of a problem, not on training models with datasets. Thus, there is no dataset explicitly mentioned as publicly available for training. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no training/validation/test splits to specify. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments that would require specific hardware. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments that would require specific software dependencies with version numbers. It refers to an 'open source implementation (Zhang, 2019)' for plots, but this is not their own software dependency for conducting their research methodology. |
| Experiment Setup | No | The paper is theoretical and does not involve empirical experiments. Therefore, no experimental setup details like hyperparameters or system-level training settings are provided. |