Induction of Subgoal Automata for Reinforcement Learning

Authors: Daniel Furelos-Blanco, Mark Law, Alessandra Russo, Krysia Broda, Anders Jonsson3890-3897

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events have on the learner s performance.
Researcher Affiliation Collaboration Daniel Furelos-Blanco,1 Mark Law,1 Alessandra Russo,1 Krysia Broda,1 Anders Jonsson2 1Imperial College London, United Kingdom, 2Universitat Pompeu Fabra, Barcelona, Spain {d.furelos-blanco18, mark.law09, a.russo, k.broda}@imperial.ac.uk, anders.jonsson@upf.edu
Pseudocode Yes Algorithm 1 ISA algorithm for a single task
Open Source Code Yes Code: github.com/ertsiger/induction-subgoal-automata-rl.
Open Datasets No The paper mentions using the OFFICEWORLD environment (Toro Icarte et al. 2018) and a set of 100 randomly generated grids. However, it does not provide concrete access information (link, DOI, specific repository, or citation with author/year in brackets for the dataset itself) for these grids or the environment. It cites the paper describing OFFICEWORLD, but not the dataset itself.
Dataset Splits No The paper mentions running episodes and testing on random grids but does not specify explicit training, validation, or test splits. It states: "ISA receives a set of 100 randomly generated grids5. One episode is run per grid in sequential order until reaching 20,000 episodes for each grid." and "We execute 10 runs for each setting." This does not describe explicit data partitioning for train/val/test sets.
Hardware Specification Yes All experiments ran on 3.40GHz Intel R Core TM i7-6700 processors.
Software Dependencies No The paper mentions using ILASP and Q-learning, but does not provide specific version numbers for any software dependencies. For example, it says "Tabular Q-learning is used to learn the Q-function at each automaton state with parameters α = 0.1, ϵ = 0.1, and γ = 0.99." and "We use ILASP to learn the automata..." but no versions for these or other libraries.
Experiment Setup Yes Tabular Q-learning is used to learn the Q-function at each automaton state with parameters α = 0.1, ϵ = 0.1, and γ = 0.99. The agent s state is its position. ISA receives a set of 100 randomly generated grids5. One episode is run per grid in sequential order until reaching 20,000 episodes for each grid. The maximum episode length is 100 steps.