The Option-Critic Architecture
Authors: Pierre-Luc Bacon, Jean Harb, Doina Precup
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework. |
| Researcher Affiliation | Academia | Reasoning and Learning Lab, School of Computer Science Mc Gill University {pbacon, jharb, dprecup}@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1: Option-critic with tabular intra-option Q-learning |
| Open Source Code | No | No explicit statement or link providing access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | four-rooms domain (Sutton, Precup, and Singh 1999), Pinball domain (Konidaris and Barto 2009), Arcade Learning Environment (ALE) (Bellemare et al. 2013) |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or sample counts for training, validation, and test dataset splits. It mentions using 'ϵ-greedy policy over options with ϵ = 0.05 during the test phase'. |
| Hardware Specification | No | The paper describes the neural network architecture used but does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'RMSProp' and 'Boltzmann policies' but does not provide specific version numbers for any libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | The learning rates were set to 0.01 for the critic and 0.001 for both the intra and termination gradients. We used an epsilon-greedy policy over options with ϵ = 0.01. and We fixed the learning rate for the intra-option policies and termination gradient to 0.00025 and used RMSProp for the critic. |