reproducibilityindex.ai

Induction of Subgoal Automata for Reinforcement Learning

Authors: Daniel Furelos-Blanco, Mark Law, Alessandra Russo, Krysia Broda, Anders Jonsson3890-3897

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events have on the learner s performance.
Researcher Affiliation	Collaboration	Daniel Furelos-Blanco,1 Mark Law,1 Alessandra Russo,1 Krysia Broda,1 Anders Jonsson2 1Imperial College London, United Kingdom, 2Universitat Pompeu Fabra, Barcelona, Spain {d.furelos-blanco18, mark.law09, a.russo, k.broda}@imperial.ac.uk, anders.jonsson@upf.edu
Pseudocode	Yes	Algorithm 1 ISA algorithm for a single task
Open Source Code	Yes	Code: github.com/ertsiger/induction-subgoal-automata-rl.
Open Datasets	No	The paper mentions using the OFFICEWORLD environment (Toro Icarte et al. 2018) and a set of 100 randomly generated grids. However, it does not provide concrete access information (link, DOI, specific repository, or citation with author/year in brackets for the dataset itself) for these grids or the environment. It cites the paper describing OFFICEWORLD, but not the dataset itself.
Dataset Splits	No	The paper mentions running episodes and testing on random grids but does not specify explicit training, validation, or test splits. It states: "ISA receives a set of 100 randomly generated grids5. One episode is run per grid in sequential order until reaching 20,000 episodes for each grid." and "We execute 10 runs for each setting." This does not describe explicit data partitioning for train/val/test sets.
Hardware Specification	Yes	All experiments ran on 3.40GHz Intel R Core TM i7-6700 processors.
Software Dependencies	No	The paper mentions using ILASP and Q-learning, but does not provide specific version numbers for any software dependencies. For example, it says "Tabular Q-learning is used to learn the Q-function at each automaton state with parameters α = 0.1, ϵ = 0.1, and γ = 0.99." and "We use ILASP to learn the automata..." but no versions for these or other libraries.
Experiment Setup	Yes	Tabular Q-learning is used to learn the Q-function at each automaton state with parameters α = 0.1, ϵ = 0.1, and γ = 0.99. The agent s state is its position. ISA receives a set of 100 randomly generated grids5. One episode is run per grid in sequential order until reaching 20,000 episodes for each grid. The maximum episode length is 100 steps.