Natural Option Critic

Authors: Saket Tiwari, Philip S. Thomas5175-5182

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results showcase improvement over the vanilla gradient approach.
Researcher Affiliation Academia Saket Tiwari College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 sakettiwari@umass.edu Philip S. Thomas College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 pthomas@cs.umass.edu
Pseudocode Yes Algorithm 1 Incremental Natural Option-Critic Algorithm (INOC)
Open Source Code No The paper does not contain any statement about releasing source code or a link to a repository for the methodology described.
Open Datasets Yes We compare natural option-critic with the option critic framework on the Arcade Learning Environment (Bellemare et al. 2013). The four rooms domain (Sutton, Precup, and Singh 1999) is a particularly favorable case for demonstrating the use of options.
Dataset Splits No The paper describes experiments in reinforcement learning environments rather than using traditional datasets with specified training, validation, and test splits. It does not provide explicit dataset split information for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used to run its experiments, such as CPU or GPU models, or cloud instance specifications.
Software Dependencies No The paper mentions software components like 'RMSProp' but does not specify version numbers for any software, libraries, or environments, which would be necessary for reproducible dependency management.
Experiment Setup Yes MDP Setup: ... We set the learning rate for the intra-option policies, αθ, to be negligible... Four Rooms: The four rooms domain... αθ = αϑ = 0.0025, αη = 0.5, αϕ = 0.75, λ = 0.5 and critic LR 0.5... Arcade Learning Environment: ... with αθ = αϑ = 0.0025, αη = αϕ = 0.75, and λ = 0.5