When Waiting Is Not an Option: Learning Options With a Deliberation Cost
Authors: Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability. We used Amidar, a game of the Atari 2600 suite, to analyze the option policies and terminations qualitatively. |
| Researcher Affiliation | Academia | Reasoning and Learning Lab, Mc Gill University {jharb,pbacon,mklissa,dprecup}@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1: Asynchronous Advantage Option Critic |
| Open Source Code | Yes | 1The source code is available at https://github.com/jeanharb/ a2oc delib |
| Open Datasets | Yes | Arcade Learning Environment (Bellemare et al. 2013) |
| Dataset Splits | No | The paper describes data preprocessing and training parameters but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., specific GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions algorithms like A3C and DQN and the use of convolutional neural networks, but it does not specify software dependencies with version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow, and their versions). |
| Experiment Setup | Yes | As for the hyperparameters, we use an ϵ-greedy policy over options, with ϵ = 0.1. The preprocessing are the same as the A3C, with RGB pixels scaled to 84 84 grayscale images. The agent repeats actions for 4 consecutive moves and receives stacks of 4 frames as inputs. We used entropy regularization of 0.01, which pushes option policies not to collapse to deterministic policies. A learning rate of 0.0007 was used in all experiments. |