Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning
Authors: Arnaud Robert, Ciara Pike-Burke, Aldo A. Faisal
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical findings by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks (hierarchical n-rooms, Gymnasium s Taxi). |
| Researcher Affiliation | Academia | Arnaud Robert Brain & Behaviour Lab Dept. of Computing Imperial College London, UK a.robert20@imperial.ac.uk Ciara Pike-Burke Dept. of Mathematics Imperial College London, UK c.pike-burke@imperial.ac.uk A. Aldo Faisal Brain & Behaviour Lab Depts. of Computing & Bioengineering Imperial College London, UK Chair in Digital Health & Data Science University of Bayreuth, Germany a.faisal@imperial.ac.uk |
| Pseudocode | Yes | Algorithm 1: Stationary Hierarchical Q-learning (SHQL) |
| Open Source Code | Yes | All required code to reproduce the experiments is made available online [1]. [1] A. A. Faisal A. Robert, C. Pike-Burke. Code and resources for running Hierarchical Stationary Q-learning: doi.org/10.6084/m9.figshare.24291172. |
| Open Datasets | Yes | We empirically validate our theoretical findings by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks (hierarchical n-rooms, Gymnasium s Taxi). |
| Dataset Splits | No | The paper describes training RL agents through interaction with environments (n-rooms, Gymnasium Taxi) but does not provide explicit training/validation/test dataset splits with percentages or sample counts in the way typically seen for static datasets in supervised learning. |
| Hardware Specification | Yes | Experiments were run on a 12th Gen Intel Core i7 with 16GB of RAM, to train the agents on the largest maze considered takes 7 minutes. |
| Software Dependencies | No | The paper mentions the Gymnasium s Taxi environment but does not specify versions for any programming languages or software libraries used for implementation (e.g., Python, PyTorch, TensorFlow, scikit-learn). |
| Experiment Setup | Yes | The hyperparameters considered are the initial exploration rate ϵ [0.1, 0.3, , 0.7, 0.9] and the decay rate δ [1 10 3, 1 10 4, , 1 10 7]. Note that we only considered the following decay function ϵk+1 = ϵk δk+1, where k denotes the current episode number. |