Creating Multi-Level Skill Hierarchies in Reinforcement Learning
Authors: Joshua B. Evans, Özgür Şimşek
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi. |
| Researcher Affiliation | Academia | Joshua B. Evans Department of Computer Science University of Bath Bath, United Kingdom jbe25@bath.ac.uk Özgür Sim sek Department of Computer Science University of Bath Bath, United Kingdom o.simsek@bath.ac.uk |
| Pseudocode | Yes | Pseudocode for the Louvain algorithm is presented in Section H of the supplementary material. |
| Open Source Code | No | The paper does not include an explicit statement about releasing the source code or a link to a code repository for their proposed methodology. |
| Open Datasets | Yes | We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi. We describe each of these environments fully in Section A of the supplementary material. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations). |
| Hardware Specification | No | This research made use of Hex, the GPU Cloud in the Department of Computer Science at the University of Bath. |
| Software Dependencies | No | The paper discusses algorithms and learning methods but does not provide specific software names with version numbers for libraries or frameworks used in their implementation. |
| Experiment Setup | Yes | When generating partitions of the state transition graph using the Louvain algorithm, we used a resolution parameter of ρ = 0.05, unless stated otherwise. When converting the output of the Louvain algorithm into a concrete skill hierarchy, we discarded all levels of the cluster hierarchy where the mean number of nodes per cluster was less than 4. For all methods used in our comparisons, we generated options using the complete state transition graph and learned their policies offline using macro-Q learning [37]. We trained all hierarchical agents using macro-Q learning and intra-option learning [38]. The primitive agent was trained using Q-Learning [39]. |