The Option-Critic Architecture

Authors: Pierre-Luc Bacon, Jean Harb, Doina Precup

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework.
Researcher Affiliation Academia Reasoning and Learning Lab, School of Computer Science Mc Gill University {pbacon, jharb, dprecup}@cs.mcgill.ca
Pseudocode Yes Algorithm 1: Option-critic with tabular intra-option Q-learning
Open Source Code No No explicit statement or link providing access to the source code for the methodology described in this paper was found.
Open Datasets Yes four-rooms domain (Sutton, Precup, and Singh 1999), Pinball domain (Konidaris and Barto 2009), Arcade Learning Environment (ALE) (Bellemare et al. 2013)
Dataset Splits No The paper does not explicitly provide specific percentages or sample counts for training, validation, and test dataset splits. It mentions using 'ϵ-greedy policy over options with ϵ = 0.05 during the test phase'.
Hardware Specification No The paper describes the neural network architecture used but does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software components like 'RMSProp' and 'Boltzmann policies' but does not provide specific version numbers for any libraries, frameworks, or programming languages used.
Experiment Setup Yes The learning rates were set to 0.01 for the critic and 0.001 for both the intra and termination gradients. We used an epsilon-greedy policy over options with ϵ = 0.01. and We fixed the learning rate for the intra-option policies and termination gradient to 0.00025 and used RMSProp for the critic.