reproducibilityindex.ai

The Option-Critic Architecture

Authors: Pierre-Luc Bacon, Jean Harb, Doina Precup

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results in both discrete and continuous environments showcase the ﬂexibility and efﬁciency of the framework.
Researcher Affiliation	Academia	Reasoning and Learning Lab, School of Computer Science Mc Gill University {pbacon, jharb, dprecup}@cs.mcgill.ca
Pseudocode	Yes	Algorithm 1: Option-critic with tabular intra-option Q-learning
Open Source Code	No	No explicit statement or link providing access to the source code for the methodology described in this paper was found.
Open Datasets	Yes	four-rooms domain (Sutton, Precup, and Singh 1999), Pinball domain (Konidaris and Barto 2009), Arcade Learning Environment (ALE) (Bellemare et al. 2013)
Dataset Splits	No	The paper does not explicitly provide specific percentages or sample counts for training, validation, and test dataset splits. It mentions using 'ϵ-greedy policy over options with ϵ = 0.05 during the test phase'.
Hardware Specification	No	The paper describes the neural network architecture used but does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'RMSProp' and 'Boltzmann policies' but does not provide specific version numbers for any libraries, frameworks, or programming languages used.
Experiment Setup	Yes	The learning rates were set to 0.01 for the critic and 0.001 for both the intra and termination gradients. We used an epsilon-greedy policy over options with ϵ = 0.01. and We ﬁxed the learning rate for the intra-option policies and termination gradient to 0.00025 and used RMSProp for the critic.