reproducibilityindex.ai

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

Authors: Francisco Garcia, Philip S. Thomas

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with experiments that show the beneﬁts of optimizing an exploration strategy using our proposed framework. 6 Empirical Results In this section we present experiments for discrete and continuous control tasks.
Researcher Affiliation	Academia	Francisco M. Garcia and Philip S. Thomas College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA, USA {fmgarcia,pthomas}@cs.umass.edu
Pseudocode	Yes	Pseudocode for the implementations used in our framework using REINFORCE and PPO are shown in Appendix C.
Open Source Code	Yes	Code used for this paper can be found at https://github.com/fmaxgarcia/Meta-MDP
Open Datasets	Yes	Implementations used for the discrete case pole-balancing and all continuous control problems, where taken from Open AI Gym, Roboschool benchmarks [2]. For the driving task experiments we used a simulator implemented in Unity by Tawn Kramer from the Donkey Car community 1. 1The Unity simulator for the self-driving task can be found at https://github.com/tawnkramer/ sdsandbox
Dataset Splits	No	The paper refers to 'training tasks' and 'testing tasks' but does not specify explicit training, validation, and test dataset splits with percentages or counts for any single dataset.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'Open AI Gym', 'Roboschool', and 'Unity' as software used but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	In our experiments we set the initial value of to 0.8, and decreased by a factor of 0.995 every episode. Both policies, and µ, were trained using REINFORCE: for I = 1,000 episodes and µ for 500 iterations.