Induction of Subgoal Automata for Reinforcement Learning
Authors: Daniel Furelos-Blanco, Mark Law, Alessandra Russo, Krysia Broda, Anders Jonsson3890-3897
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events have on the learner s performance. |
| Researcher Affiliation | Collaboration | Daniel Furelos-Blanco,1 Mark Law,1 Alessandra Russo,1 Krysia Broda,1 Anders Jonsson2 1Imperial College London, United Kingdom, 2Universitat Pompeu Fabra, Barcelona, Spain {d.furelos-blanco18, mark.law09, a.russo, k.broda}@imperial.ac.uk, anders.jonsson@upf.edu |
| Pseudocode | Yes | Algorithm 1 ISA algorithm for a single task |
| Open Source Code | Yes | Code: github.com/ertsiger/induction-subgoal-automata-rl. |
| Open Datasets | No | The paper mentions using the OFFICEWORLD environment (Toro Icarte et al. 2018) and a set of 100 randomly generated grids. However, it does not provide concrete access information (link, DOI, specific repository, or citation with author/year in brackets for the dataset itself) for these grids or the environment. It cites the paper describing OFFICEWORLD, but not the dataset itself. |
| Dataset Splits | No | The paper mentions running episodes and testing on random grids but does not specify explicit training, validation, or test splits. It states: "ISA receives a set of 100 randomly generated grids5. One episode is run per grid in sequential order until reaching 20,000 episodes for each grid." and "We execute 10 runs for each setting." This does not describe explicit data partitioning for train/val/test sets. |
| Hardware Specification | Yes | All experiments ran on 3.40GHz Intel R Core TM i7-6700 processors. |
| Software Dependencies | No | The paper mentions using ILASP and Q-learning, but does not provide specific version numbers for any software dependencies. For example, it says "Tabular Q-learning is used to learn the Q-function at each automaton state with parameters α = 0.1, ϵ = 0.1, and γ = 0.99." and "We use ILASP to learn the automata..." but no versions for these or other libraries. |
| Experiment Setup | Yes | Tabular Q-learning is used to learn the Q-function at each automaton state with parameters α = 0.1, ϵ = 0.1, and γ = 0.99. The agent s state is its position. ISA receives a set of 100 randomly generated grids5. One episode is run per grid in sequential order until reaching 20,000 episodes for each grid. The maximum episode length is 100 steps. |