Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Flexible Option Learning
Authors: Martin Klissarov, Doina Precup
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify the merits of our approach on a wide variety of domains, ranging from gridworlds using tabular representations [Sutton et al., 1999b, Bacon et al., 2016], control with linear function approximation [Moore, 1991], continuous control [Todorov et al., 2012, Brockman et al., 2016] and vision-based navigation [Chevalier-Boisvert, 2018]. |
| Researcher Affiliation | Collaboration | Martin Klissarov Mila, Mc Gill University EMAIL Doina Precup Mila, Mc Gill University and Deep Mind EMAIL |
| Pseudocode | Yes | We do so by instantiating these update rules within the option-critic (OC) algorithm which we present in Algorithm 1 in appendix A.1 and name it Multi-updates Option Critic (MOC) algorithm. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We ο¬rst evaluate our approach in the classic Four Rooms domain [Sutton et al., 1999b] [...] classic Mountain Car domain [Moore, 1991, Sutton and Barto, 2018] [...] Mu Joco domain [Todorov et al., 2012] [...] Mini World [Chevalier-Boisvert, 2018]. |
| Dataset Splits | No | The paper mentions that hyperparameters are available in the appendix, implying some form of tuning, but does not explicitly describe any train/validation/test dataset splits or methodologies for data partitioning. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as CPU or GPU models, memory, or types of computing clusters used for running the experiments. |
| Software Dependencies | No | The paper mentions using and building upon algorithms like A2C and PPO, and references a PyTorch implementation, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | All hyperparameters are available in the appendix. |