Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Authors: Francisco Garcia, Philip S. Thomas
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed framework. 6 Empirical Results In this section we present experiments for discrete and continuous control tasks. |
| Researcher Affiliation | Academia | Francisco M. Garcia and Philip S. Thomas College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA, USA EMAIL |
| Pseudocode | Yes | Pseudocode for the implementations used in our framework using REINFORCE and PPO are shown in Appendix C. |
| Open Source Code | Yes | Code used for this paper can be found at https://github.com/fmaxgarcia/Meta-MDP |
| Open Datasets | Yes | Implementations used for the discrete case pole-balancing and all continuous control problems, where taken from Open AI Gym, Roboschool benchmarks [2]. For the driving task experiments we used a simulator implemented in Unity by Tawn Kramer from the Donkey Car community 1. 1The Unity simulator for the self-driving task can be found at https://github.com/tawnkramer/ sdsandbox |
| Dataset Splits | No | The paper refers to 'training tasks' and 'testing tasks' but does not specify explicit training, validation, and test dataset splits with percentages or counts for any single dataset. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Gym', 'Roboschool', and 'Unity' as software used but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In our experiments we set the initial value of to 0.8, and decreased by a factor of 0.995 every episode. Both policies, and µ, were trained using REINFORCE: for I = 1,000 episodes and µ for 500 iterations. |