Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Creating Multi-Level Skill Hierarchies in Reinforcement Learning

Authors: Joshua B. Evans, Özgür Şimşek

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi.
Researcher Affiliation Academia Joshua B. Evans Department of Computer Science University of Bath Bath, United Kingdom EMAIL Özgür Sim sek Department of Computer Science University of Bath Bath, United Kingdom EMAIL
Pseudocode Yes Pseudocode for the Louvain algorithm is presented in Section H of the supplementary material.
Open Source Code No The paper does not include an explicit statement about releasing the source code or a link to a code repository for their proposed methodology.
Open Datasets Yes We analyse the Louvain skill hierarchy in the six environments depicted in Figure 1: Rooms, Grid, Maze [23], Office, Taxi [36], and Towers of Hanoi. We describe each of these environments fully in Section A of the supplementary material.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations).
Hardware Specification No This research made use of Hex, the GPU Cloud in the Department of Computer Science at the University of Bath.
Software Dependencies No The paper discusses algorithms and learning methods but does not provide specific software names with version numbers for libraries or frameworks used in their implementation.
Experiment Setup Yes When generating partitions of the state transition graph using the Louvain algorithm, we used a resolution parameter of ρ = 0.05, unless stated otherwise. When converting the output of the Louvain algorithm into a concrete skill hierarchy, we discarded all levels of the cluster hierarchy where the mean number of nodes per cluster was less than 4. For all methods used in our comparisons, we generated options using the complete state transition graph and learned their policies offline using macro-Q learning [37]. We trained all hierarchical agents using macro-Q learning and intra-option learning [38]. The primitive agent was trained using Q-Learning [39].