Creating Multi-Level Skill Hierarchies in Reinforcement Learning

Authors: Joshua B. Evans, Özgür Şimşek

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi.
Researcher Affiliation Academia Joshua B. Evans Department of Computer Science University of Bath Bath, United Kingdom jbe25@bath.ac.uk Özgür Sim sek Department of Computer Science University of Bath Bath, United Kingdom o.simsek@bath.ac.uk
Pseudocode Yes Pseudocode for the Louvain algorithm is presented in Section H of the supplementary material.
Open Source Code No The paper does not include an explicit statement about releasing the source code or a link to a code repository for their proposed methodology.
Open Datasets Yes We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi. We describe each of these environments fully in Section A of the supplementary material.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations).
Hardware Specification No This research made use of Hex, the GPU Cloud in the Department of Computer Science at the University of Bath.
Software Dependencies No The paper discusses algorithms and learning methods but does not provide specific software names with version numbers for libraries or frameworks used in their implementation.
Experiment Setup Yes When generating partitions of the state transition graph using the Louvain algorithm, we used a resolution parameter of ρ = 0.05, unless stated otherwise. When converting the output of the Louvain algorithm into a concrete skill hierarchy, we discarded all levels of the cluster hierarchy where the mean number of nodes per cluster was less than 4. For all methods used in our comparisons, we generated options using the complete state transition graph and learned their policies offline using macro-Q learning [37]. We trained all hierarchical agents using macro-Q learning and intra-option learning [38]. The primitive agent was trained using Q-Learning [39].