Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Induction and Exploitation of Subgoal Automata for Reinforcement Learning

Authors: Daniel Furelos-Blanco, Mark Law, Anders Jonsson, Krysia Broda, Alessandra Russo

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate ISA in several gridworld and continuous state space problems using diﬀerent RL algorithms that leverage the automaton structures. We provide an in-depth empirical analysis of the automaton learning performance in terms of the traces, the symmetry breaking and speciﬁc restrictions imposed on the ﬁnal learnable automaton.
Researcher Affiliation	Academia	Daniel Furelos-Blanco EMAIL Mark Law EMAIL Department of Computing Imperial College London London, SW7 2AZ, United Kingdom Anders Jonsson EMAIL Department of Information and Communication Technologies Universitat Pompeu Fabra Roc Boronat 138, 08018 Barcelona, Spain Krysia Broda EMAIL Alessandra Russo EMAIL Department of Computing Imperial College London London, SW7 2AZ, United Kingdom
Pseudocode	Yes	Algorithm 1 ISA Algorithm
Open Source Code	Yes	The code is available at https://github.com/ertsiger/induction-subgoal-automata-rl.
Open Datasets	Yes	Our analysis focuses on evaluating how the behavior of the RL agent and the task being learned aﬀect automaton learning and vice versa. Firstly, we describe the main characteristics of our evaluation methodology. Secondly, we make a thorough analysis of the performance of our approach using the Office World (Toro Icarte et al., 2018), Craft World (Andreas et al., 2017) and Water World (Toro Icarte et al., 2018) domains.
Dataset Splits	No	The sets of POMDPs used in these experiments are not handcrafted, but randomly generated (e.g., placing observables randomly in a grid).
Hardware Specification	Yes	All experiments ran on 3.40GHz Intel R Core TM i7-6700 processors.
Software Dependencies	No	We use ILASP2 to learn the automata with a 2 hour timeout for each automaton learning task. ... we use a Double DQN (DDQN, van Hasselt et al., 2016) to approximate the Q-functions in both HRL and QRM. ... We train the neural networks using the Adam optimizer (Kingma & Ba, 2015) with α = 1 10 5. No specific software versions for libraries or frameworks (e.g., Python, TensorFlow, PyTorch) are provided.
Experiment Setup	Yes	Table 1: Parameters used in the Office World experiments. Learning rate (α) 0.1 Exploration rate (ϵ) 0.1 Discount factor (γ) 0.99 Number of episodes 10,000 Maximum episode length (N) 250 Number of disjuncts (κ) 1