The Expected-Length Model of Options
Authors: David Abel, John Winder, Marie desJardins, Michael Littman
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now explore the utility of ELM through experiments. The main hypothesis we investigate is how ELM compares to MTM for learning and exploiting option models in SSPs. Figures 2, 3, and 4 present performance curves with 95% confidence intervals for the domains that we discuss shortly in more detail. |
| Researcher Affiliation | Academia | 1Brown University 2University of Maryland, Baltimore County 3Simmons University |
| Pseudocode | No | The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions common reinforcement learning domains like Four Rooms and Taxi, and cites their original papers (e.g., Sutton et al. [1999] for Four Rooms), but does not provide specific links, DOIs, repositories, or explicit instructions for accessing the exact experimental datasets generated within these domains for reproduction. |
| Dataset Splits | No | The paper describes a reinforcement learning setup where agents learn through episodes. It does not mention explicit training, validation, or test dataset splits in terms of percentages or sample counts, as data is generated dynamically during interaction. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, processor types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions methods and algorithms like R-MAX and value iteration, but does not list specific software libraries, frameworks, or solvers along with their version numbers required for reproduction. |
| Experiment Setup | Yes | We set m = 5 for the confidence parameter in R-MAX. Across all MDPs, γ = 0.99, and all transitions are stochastic with probability 4/5 of an action succeeding, otherwise transitioning with probability 1/5 to a different adjacent state (as if another action had been selected). |