Natural Option Critic
Authors: Saket Tiwari, Philip S. Thomas5175-5182
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results showcase improvement over the vanilla gradient approach. |
| Researcher Affiliation | Academia | Saket Tiwari College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 sakettiwari@umass.edu Philip S. Thomas College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 pthomas@cs.umass.edu |
| Pseudocode | Yes | Algorithm 1 Incremental Natural Option-Critic Algorithm (INOC) |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a repository for the methodology described. |
| Open Datasets | Yes | We compare natural option-critic with the option critic framework on the Arcade Learning Environment (Bellemare et al. 2013). The four rooms domain (Sutton, Precup, and Singh 1999) is a particularly favorable case for demonstrating the use of options. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments rather than using traditional datasets with specified training, validation, and test splits. It does not provide explicit dataset split information for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its experiments, such as CPU or GPU models, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions software components like 'RMSProp' but does not specify version numbers for any software, libraries, or environments, which would be necessary for reproducible dependency management. |
| Experiment Setup | Yes | MDP Setup: ... We set the learning rate for the intra-option policies, αθ, to be negligible... Four Rooms: The four rooms domain... αθ = αϑ = 0.0025, αη = 0.5, αϕ = 0.75, λ = 0.5 and critic LR 0.5... Arcade Learning Environment: ... with αθ = αϑ = 0.0025, αη = αϕ = 0.75, and λ = 0.5 |