Evaluating the Performance of Reinforcement Learning Algorithms
Authors: Scott Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks. |
| Researcher Affiliation | Academia | 1College of Information and Computer Sciences, University of Massachusetts, MA, USA. |
| Pseudocode | Yes | We provide pseudocode in Appendix C and source code in the repository. |
| Open Source Code | Yes | Source code for this paper can be found at https:// github.com/Scott Jordan/Evaluation Of RLAlgs. |
| Open Datasets | Yes | These algorithms are evaluated on 15 environments, eight discrete MDPs, half with stochastic transition dynamics, and seven continuous state environments: Cart-Pole (Florian, 2007), Mountain Car (Sutton & Barto, 1998), Acrobot (Sutton, 1995), and four variations of the pinball environment (Konidaris & Barto, 2009; Geramifard et al., 2015). |
| Dataset Splits | No | The paper discusses 'tuning phase' and 'testing phase' for algorithms and refers to 'trials' but does not specify a train/validation/test split for a dataset used in the traditional sense for model training and evaluation. |
| Hardware Specification | No | The paper mentions 'high performance computing equipment' in the acknowledgements, but no specific hardware details (e.g., GPU/CPU models, memory) are provided for the experiments. |
| Software Dependencies | No | The paper mentions 'Julia (Bezanson et al., 2017) or C++, where we have noticed approximately two orders of magnitude faster execution than similar Python implementations', but it does not provide specific version numbers for any of these programming languages or relevant libraries. |
| Experiment Setup | Yes | For the continuous state environments, each algorithm employs linear function approximation using the Fourier basis (Konidaris et al., 2011) with a randomly sampled order. See Appendix E for full details of each algorithm. For further details about the experiment see Appendix F. |