Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning

Authors: Arnaud Robert, Ciara Pike-Burke, Aldo A. Faisal

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our theoretical findings by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks (hierarchical n-rooms, Gymnasium s Taxi).
Researcher Affiliation Academia Arnaud Robert Brain & Behaviour Lab Dept. of Computing Imperial College London, UK a.robert20@imperial.ac.uk Ciara Pike-Burke Dept. of Mathematics Imperial College London, UK c.pike-burke@imperial.ac.uk A. Aldo Faisal Brain & Behaviour Lab Depts. of Computing & Bioengineering Imperial College London, UK Chair in Digital Health & Data Science University of Bayreuth, Germany a.faisal@imperial.ac.uk
Pseudocode Yes Algorithm 1: Stationary Hierarchical Q-learning (SHQL)
Open Source Code Yes All required code to reproduce the experiments is made available online [1]. [1] A. A. Faisal A. Robert, C. Pike-Burke. Code and resources for running Hierarchical Stationary Q-learning: doi.org/10.6084/m9.figshare.24291172.
Open Datasets Yes We empirically validate our theoretical findings by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks (hierarchical n-rooms, Gymnasium s Taxi).
Dataset Splits No The paper describes training RL agents through interaction with environments (n-rooms, Gymnasium Taxi) but does not provide explicit training/validation/test dataset splits with percentages or sample counts in the way typically seen for static datasets in supervised learning.
Hardware Specification Yes Experiments were run on a 12th Gen Intel Core i7 with 16GB of RAM, to train the agents on the largest maze considered takes 7 minutes.
Software Dependencies No The paper mentions the Gymnasium s Taxi environment but does not specify versions for any programming languages or software libraries used for implementation (e.g., Python, PyTorch, TensorFlow, scikit-learn).
Experiment Setup Yes The hyperparameters considered are the initial exploration rate ϵ [0.1, 0.3, , 0.7, 0.9] and the decay rate δ [1 10 3, 1 10 4, , 1 10 7]. Note that we only considered the following decay function ϵk+1 = ϵk δk+1, where k denotes the current episode number.