The Gambler's Problem and Beyond
Authors: Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We analytically investigate a deceptively simple problem, the Gambler s problem, introduced in the reinforcement learning textbook by Sutton & Barto (2018), on Example 4.3, Chapter 4, page 84. The problem setting is natural and simple enough that little discussion was given in the book apart from an algorithmic solution by value iteration. A close inspection will however show that the problem, as a representative of the entire family of Markov decision processes (MDP), involves a level of complexity and curiosity uncharted in years of reinforcement learning research. |
| Researcher Affiliation | Academia | Baoxiang Wang Department of Computer Science and Engineering The Chinese University of Hong Kong bxwang@cse.cuhk.edu.hk Shuai Li John Hopcroft Center for Computer Science Shanghai Jiao Tong University shuaili8@sjtu.edu.cn Jiajin Li Department of SEEM The Chinese University of Hong Kong jjli@se.cuhk.edu.hk Siu On Chan Department of Computer Science and Engineering The Chinese University of Hong Kong siuon@cse.cuhk.edu.hk |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code for its own methodology. It references a third-party open-source implementation for illustrative plots, but not its own work. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical analysis of a problem, not on training models with datasets. Thus, there is no dataset explicitly mentioned as publicly available for training. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no training/validation/test splits to specify. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments that would require specific hardware. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments that would require specific software dependencies with version numbers. It refers to an 'open source implementation (Zhang, 2019)' for plots, but this is not their own software dependency for conducting their research methodology. |
| Experiment Setup | No | The paper is theoretical and does not involve empirical experiments. Therefore, no experimental setup details like hyperparameters or system-level training settings are provided. |