reproducibilityindex.ai

A Laplacian Framework for Option Discovery in Reinforcement Learning

Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Empirical Evaluation We used three MDPs in our empirical study (c.f. Figure 1): an open room, an I-Maze, and the 4-room domain. and In these experiments, the agent starts at the bottom left corner and its goal is to reach the top right corner. The agent observes a reward of 0 until the goal is reached, when it observes a reward of +1. We used Q-Learning (alpha = 0.1, gamma = 0.9) to learn a policy over primitive actions.
Researcher Affiliation	Collaboration	1University of Alberta 2Google Deep Mind.
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided; methods are described in prose.
Open Source Code	Yes	1Python code can be found at: https://github.com/mcmachado/options
Open Datasets	Yes	We tested our method in the ALE (Bellemare et al., 2013).
Dataset Splits	No	The paper does not explicitly state specific dataset splits for training, validation, and testing, such as percentages or counts.
Hardware Specification	No	The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	While Python code is mentioned as available, specific software dependencies like library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) are not provided.
Experiment Setup	Yes	We used Q-Learning (alpha = 0.1, gamma = 0.9) to learn a policy over primitive actions. and Episodes were 100 time steps long, and we learned for 250 episodes in the 10x10 grid and in the I-Maze, and for 500 episodes in the 4-room domain.