Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning

Authors: Arnaud Robert, Ciara Pike-Burke, Aldo A. Faisal

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our theoretical findings by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks (hierarchical n-rooms, Gymnasium s Taxi).
Researcher Affiliation Academia Arnaud Robert Brain & Behaviour Lab Dept. of Computing Imperial College London, UK EMAIL Ciara Pike-Burke Dept. of Mathematics Imperial College London, UK EMAIL A. Aldo Faisal Brain & Behaviour Lab Depts. of Computing & Bioengineering Imperial College London, UK Chair in Digital Health & Data Science University of Bayreuth, Germany EMAIL
Pseudocode Yes Algorithm 1: Stationary Hierarchical Q-learning (SHQL)
Open Source Code Yes All required code to reproduce the experiments is made available online [1]. [1] A. A. Faisal A. Robert, C. Pike-Burke. Code and resources for running Hierarchical Stationary Q-learning: doi.org/10.6084/m9.figshare.24291172.
Open Datasets Yes We empirically validate our theoretical findings by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks (hierarchical n-rooms, Gymnasium s Taxi).
Dataset Splits No The paper describes training RL agents through interaction with environments (n-rooms, Gymnasium Taxi) but does not provide explicit training/validation/test dataset splits with percentages or sample counts in the way typically seen for static datasets in supervised learning.
Hardware Specification Yes Experiments were run on a 12th Gen Intel Core i7 with 16GB of RAM, to train the agents on the largest maze considered takes 7 minutes.
Software Dependencies No The paper mentions the Gymnasium s Taxi environment but does not specify versions for any programming languages or software libraries used for implementation (e.g., Python, PyTorch, TensorFlow, scikit-learn).
Experiment Setup Yes The hyperparameters considered are the initial exploration rate ϵ [0.1, 0.3, , 0.7, 0.9] and the decay rate δ [1 10 3, 1 10 4, , 1 10 7]. Note that we only considered the following decay function ϵk+1 = ϵk δk+1, where k denotes the current episode number.