reproducibilityindex.ai

Hierarchy Through Composition with Multitask LMDPs

Authors: Andrew M. Saxe, Adam C. Earle, Benjamin Rosman

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments
Researcher Affiliation	Academia	1Center for Brain Science, Harvard University 2School of Computer Science and Applied Mathematics, University of the Witwatersrand 3Council for Scientiﬁc and Industrial Research, South Africa.
Pseudocode	Yes	We now describe the execution model (see Supplementary Material for pseudocode listing).
Open Source Code	No	The paper mentions 'see Supplementary Material for pseudocode listing' but does not explicitly state that source code for the methodology is openly available or provide a link.
Open Datasets	Yes	To illustrate the operation of our scheme, we apply it to a 2D grid-world rooms domain (Fig. 3(a)). This domain is shown in Fig. 5, and is taken from previous experiments in transfer learning (Fernández & Veloso, 2006).
Dataset Splits	No	The paper describes how the agent is initialized and trajectory lengths are capped, but it does not specify dataset splits for training, validation, and testing with percentages or sample counts.
Hardware Specification	No	The paper does not mention any specific hardware (e.g., GPU/CPU models, processor types, memory) used for running its experiments.
Software Dependencies	No	The paper mentions learning algorithms like Q-learning but does not provide specific software names with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	In the options framework the agent’s action space is augmented with the full set of option policies. The initialization set for these options is the full state space, so that any option may be executed at any time. The termination condition is deﬁned such that the option terminates only when it reaches its goal state. To minimize the action space for the options agent, we remove the primitive actions, reducing the learning problem to simply choosing the single correct option from each state. The options learning problem is solved using Q-learning with sigmoidal learning rate decrease and ϵ-greedy exploitation. These parameters were optimized on a coarse grid to yield the fastest learning curves.