reproducibilityindex.ai

The Expected-Length Model of Options

Authors: David Abel, John Winder, Marie desJardins, Michael Littman

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now explore the utility of ELM through experiments. The main hypothesis we investigate is how ELM compares to MTM for learning and exploiting option models in SSPs. Figures 2, 3, and 4 present performance curves with 95% conﬁdence intervals for the domains that we discuss shortly in more detail.
Researcher Affiliation	Academia	1Brown University 2University of Maryland, Baltimore County 3Simmons University
Pseudocode	No	The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository.
Open Datasets	No	The paper mentions common reinforcement learning domains like Four Rooms and Taxi, and cites their original papers (e.g., Sutton et al. [1999] for Four Rooms), but does not provide specific links, DOIs, repositories, or explicit instructions for accessing the exact experimental datasets generated within these domains for reproduction.
Dataset Splits	No	The paper describes a reinforcement learning setup where agents learn through episodes. It does not mention explicit training, validation, or test dataset splits in terms of percentages or sample counts, as data is generated dynamically during interaction.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, processor types, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper mentions methods and algorithms like R-MAX and value iteration, but does not list specific software libraries, frameworks, or solvers along with their version numbers required for reproduction.
Experiment Setup	Yes	We set m = 5 for the conﬁdence parameter in R-MAX. Across all MDPs, γ = 0.99, and all transitions are stochastic with probability 4/5 of an action succeeding, otherwise transitioning with probability 1/5 to a different adjacent state (as if another action had been selected).