Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Temporal Abstraction in Reinforcement Learning with the Successor Representation

Authors: Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation focuses on options discovered for temporally-extended exploration and on the use of the successor representation to combine them. Our results shed light on important design decisions involved in the deﬁnition of options and demonstrate the synergy of diﬀerent methods based on the successor representation, such as eigenoptions and the option keyboard. ... We perform numerical simulations to assess how eﬀective options discovered by different methods are in capturing environment properties.
Researcher Affiliation	Collaboration	Marlos C. Machado EMAIL Deep Mind Alberta Machine Intelligence Institute (Amii) Department of Computing Science, University of Alberta Edmonton, AB, Canada Andre Barreto EMAIL Deep Mind London, United Kingdom Doina Precup EMAIL Deep Mind Quebec AI Institute (Mila) School of Computer Science, Mc Gill University Montreal, QC, Canada Michael Bowling EMAIL Deep Mind Alberta Machine Intelligence Institute (Amii) Department of Computing Science, University of Alberta Edmonton, AB, Canada
Pseudocode	Yes	Algorithm 1 depicts an implementation of the SR. ... Algorithm 2, in the next page, depicts the pseudo-code for CEO. ... See Algorithm 3 for a presentation of this discussion in pseudo-code. ... Algorithm 4: OK-Eigenoptions ... Algorithm 5 and 6 summarize eigenoption discovery. ... Algorithm 7 and 8, in Appendix C, summarize the presentation of covering options when computed in both closed-form and online.
Open Source Code	No	The text does not explicitly state that the authors' implementation code is open-sourced or provide a link to a code repository for the methodology described in this paper.
Open Datasets	Yes	We use the four-room domain (Sutton et al., 1999), which we implemented with Gym-Minigrid (Chevalier-Boisvert et al., 2018).
Dataset Splits	No	The paper describes experiments conducted in simulated environments (e.g., 'four-room domain', 'open-room gridworld') where agents interact dynamically. It does not mention pre-collected datasets with explicit training, validation, or test splits, as data is generated through interaction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions implementing the four-room domain with Gym-Minigrid, but it does not specify version numbers for this or any other software dependencies used in their experiments.
Experiment Setup	Yes	The Q-learning parameters we use are α = 0.1, γ = 0.9, and ϵ = 0.05. We use η = αo = 0.1, γSR = γo = 0.99, and we sample options with 5% probability (poption), which is similar to what we did in Section 6.4, where options were potentially sampled only in the exploration step of Q-Learning with ϵ-greedy (ϵ = 0.05). We pass over D 100 times when learning the SR, and 1,000 when learning the option policy, leveraging the oﬀ-policy aspect of our problem formulation.