reproducibilityindex.ai

Creating Multi-Level Skill Hierarchies in Reinforcement Learning

Authors: Joshua B. Evans, Özgür Şimşek

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi.
Researcher Affiliation	Academia	Joshua B. Evans Department of Computer Science University of Bath Bath, United Kingdom jbe25@bath.ac.uk Özgür Sim sek Department of Computer Science University of Bath Bath, United Kingdom o.simsek@bath.ac.uk
Pseudocode	Yes	Pseudocode for the Louvain algorithm is presented in Section H of the supplementary material.
Open Source Code	No	The paper does not include an explicit statement about releasing the source code or a link to a code repository for their proposed methodology.
Open Datasets	Yes	We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi. We describe each of these environments fully in Section A of the supplementary material.
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations).
Hardware Specification	No	This research made use of Hex, the GPU Cloud in the Department of Computer Science at the University of Bath.
Software Dependencies	No	The paper discusses algorithms and learning methods but does not provide specific software names with version numbers for libraries or frameworks used in their implementation.
Experiment Setup	Yes	When generating partitions of the state transition graph using the Louvain algorithm, we used a resolution parameter of ρ = 0.05, unless stated otherwise. When converting the output of the Louvain algorithm into a concrete skill hierarchy, we discarded all levels of the cluster hierarchy where the mean number of nodes per cluster was less than 4. For all methods used in our comparisons, we generated options using the complete state transition graph and learned their policies offline using macro-Q learning [37]. We trained all hierarchical agents using macro-Q learning and intra-option learning [38]. The primitive agent was trained using Q-Learning [39].