A Laplacian Framework for Option Discovery in Reinforcement Learning
Authors: Marlos C. Machado, Marc G. Bellemare, Michael Bowling
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Empirical Evaluation We used three MDPs in our empirical study (c.f. Figure 1): an open room, an I-Maze, and the 4-room domain. and In these experiments, the agent starts at the bottom left corner and its goal is to reach the top right corner. The agent observes a reward of 0 until the goal is reached, when it observes a reward of +1. We used Q-Learning (alpha = 0.1, gamma = 0.9) to learn a policy over primitive actions. |
| Researcher Affiliation | Collaboration | 1University of Alberta 2Google Deep Mind. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided; methods are described in prose. |
| Open Source Code | Yes | 1Python code can be found at: https://github.com/mcmachado/options |
| Open Datasets | Yes | We tested our method in the ALE (Bellemare et al., 2013). |
| Dataset Splits | No | The paper does not explicitly state specific dataset splits for training, validation, and testing, such as percentages or counts. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | While Python code is mentioned as available, specific software dependencies like library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) are not provided. |
| Experiment Setup | Yes | We used Q-Learning (alpha = 0.1, gamma = 0.9) to learn a policy over primitive actions. and Episodes were 100 time steps long, and we learned for 250 episodes in the 10x10 grid and in the I-Maze, and for 500 episodes in the 4-room domain. |