The Gambler's Problem and Beyond

Authors: Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We analytically investigate a deceptively simple problem, the Gambler s problem, introduced in the reinforcement learning textbook by Sutton & Barto (2018), on Example 4.3, Chapter 4, page 84. The problem setting is natural and simple enough that little discussion was given in the book apart from an algorithmic solution by value iteration. A close inspection will however show that the problem, as a representative of the entire family of Markov decision processes (MDP), involves a level of complexity and curiosity uncharted in years of reinforcement learning research.
Researcher Affiliation Academia Baoxiang Wang Department of Computer Science and Engineering The Chinese University of Hong Kong bxwang@cse.cuhk.edu.hk Shuai Li John Hopcroft Center for Computer Science Shanghai Jiao Tong University shuaili8@sjtu.edu.cn Jiajin Li Department of SEEM The Chinese University of Hong Kong jjli@se.cuhk.edu.hk Siu On Chan Department of Computer Science and Engineering The Chinese University of Hong Kong siuon@cse.cuhk.edu.hk
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not provide concrete access to source code for its own methodology. It references a third-party open-source implementation for illustrative plots, but not its own work.
Open Datasets No The paper is theoretical and focuses on mathematical analysis of a problem, not on training models with datasets. Thus, there is no dataset explicitly mentioned as publicly available for training.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no training/validation/test splits to specify.
Hardware Specification No The paper is theoretical and does not conduct experiments that would require specific hardware. Therefore, no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not conduct experiments that would require specific software dependencies with version numbers. It refers to an 'open source implementation (Zhang, 2019)' for plots, but this is not their own software dependency for conducting their research methodology.
Experiment Setup No The paper is theoretical and does not involve empirical experiments. Therefore, no experimental setup details like hyperparameters or system-level training settings are provided.