reproducibilityindex.ai

Optimistic Exploration in Reinforcement Learning Using Symbolic Model Estimates

Authors: Sarath Sreedharan, Michael Katz

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform our evaluation in four different domains. [...] Table 1 presents the comparison of our method against Q learning for the planning benchmarks.
Researcher Affiliation	Collaboration	Sarath Sreedharan Department of Computer Science Colorado State University ssreedh3@colostate.edu Michael Katz IBM T.J. Watson Research Center michael.katz1@ibm.com
Pseudocode	Yes	Algorithm 1 Iteratively refine the model until a goal reaching trace is found
Open Source Code	Yes	he code can be found at https://github.com/sarathsreedharan/Model Learner.
Open Datasets	Yes	For the RL domain, we looked at two variants of minigrid problem. One was the version introduced by [26] (henceforth referred to as Minigrid-Parl) and the other being a simplified version of the original minigrid testbed [8]. (Citation [8]: Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.)
Dataset Splits	No	The paper discusses evaluation using Q-learning episodes and sample counts but does not specify explicit training, validation, or test dataset splits or percentages.
Hardware Specification	Yes	All experiments were on a laptop running Mac OS v 11.06, with 2 GHz Quad-Core Intel Core i5 and 16 GB 3733 MHz LPDDR4X. We did not use CUDA in any of the experiments.
Software Dependencies	No	The paper mentions software like 'Simple RL framework' and 'FI-diverse-agl planner' but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup	Yes	For all planning based instances we set the time limit to 10 minutes, while for the minigrid instances we extended the time limit to 30 minutes. [...] For all the RL baselines we used a discount factor of γ. For Q learning and R max, we used a maximum of 1000000 episodes with 200 steps per episode.