reproducibilityindex.ai

When Waiting Is Not an Option: Learning Options With a Deliberation Cost

Authors: Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability. We used Amidar, a game of the Atari 2600 suite, to analyze the option policies and terminations qualitatively.
Researcher Affiliation	Academia	Reasoning and Learning Lab, Mc Gill University {jharb,pbacon,mklissa,dprecup}@cs.mcgill.ca
Pseudocode	Yes	Algorithm 1: Asynchronous Advantage Option Critic
Open Source Code	Yes	1The source code is available at https://github.com/jeanharb/ a2oc delib
Open Datasets	Yes	Arcade Learning Environment (Bellemare et al. 2013)
Dataset Splits	No	The paper describes data preprocessing and training parameters but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments (e.g., specific GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions algorithms like A3C and DQN and the use of convolutional neural networks, but it does not specify software dependencies with version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow, and their versions).
Experiment Setup	Yes	As for the hyperparameters, we use an ϵ-greedy policy over options, with ϵ = 0.1. The preprocessing are the same as the A3C, with RGB pixels scaled to 84 84 grayscale images. The agent repeats actions for 4 consecutive moves and receives stacks of 4 frames as inputs. We used entropy regularization of 0.01, which pushes option policies not to collapse to deterministic policies. A learning rate of 0.0007 was used in all experiments.