Evaluating the Performance of Reinforcement Learning Algorithms

Authors: Scott Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.
Researcher Affiliation Academia 1College of Information and Computer Sciences, University of Massachusetts, MA, USA.
Pseudocode Yes We provide pseudocode in Appendix C and source code in the repository.
Open Source Code Yes Source code for this paper can be found at https:// github.com/Scott Jordan/Evaluation Of RLAlgs.
Open Datasets Yes These algorithms are evaluated on 15 environments, eight discrete MDPs, half with stochastic transition dynamics, and seven continuous state environments: Cart-Pole (Florian, 2007), Mountain Car (Sutton & Barto, 1998), Acrobot (Sutton, 1995), and four variations of the pinball environment (Konidaris & Barto, 2009; Geramifard et al., 2015).
Dataset Splits No The paper discusses 'tuning phase' and 'testing phase' for algorithms and refers to 'trials' but does not specify a train/validation/test split for a dataset used in the traditional sense for model training and evaluation.
Hardware Specification No The paper mentions 'high performance computing equipment' in the acknowledgements, but no specific hardware details (e.g., GPU/CPU models, memory) are provided for the experiments.
Software Dependencies No The paper mentions 'Julia (Bezanson et al., 2017) or C++, where we have noticed approximately two orders of magnitude faster execution than similar Python implementations', but it does not provide specific version numbers for any of these programming languages or relevant libraries.
Experiment Setup Yes For the continuous state environments, each algorithm employs linear function approximation using the Fourier basis (Konidaris et al., 2011) with a randomly sampled order. See Appendix E for full details of each algorithm. For further details about the experiment see Appendix F.