A Laplacian Framework for Option Discovery in Reinforcement Learning

Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Empirical Evaluation We used three MDPs in our empirical study (c.f. Figure 1): an open room, an I-Maze, and the 4-room domain. and In these experiments, the agent starts at the bottom left corner and its goal is to reach the top right corner. The agent observes a reward of 0 until the goal is reached, when it observes a reward of +1. We used Q-Learning (alpha = 0.1, gamma = 0.9) to learn a policy over primitive actions.
Researcher Affiliation Collaboration 1University of Alberta 2Google Deep Mind.
Pseudocode No No explicit pseudocode or algorithm blocks are provided; methods are described in prose.
Open Source Code Yes 1Python code can be found at: https://github.com/mcmachado/options
Open Datasets Yes We tested our method in the ALE (Bellemare et al., 2013).
Dataset Splits No The paper does not explicitly state specific dataset splits for training, validation, and testing, such as percentages or counts.
Hardware Specification No The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No While Python code is mentioned as available, specific software dependencies like library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) are not provided.
Experiment Setup Yes We used Q-Learning (alpha = 0.1, gamma = 0.9) to learn a policy over primitive actions. and Episodes were 100 time steps long, and we learned for 250 episodes in the 10x10 grid and in the I-Maze, and for 500 episodes in the 4-room domain.