Two-Timescale Networks for Nonlinear Value Function Approximation

Authors: Wesley Chung, Somjit Nath, Ajin Joseph, Martha White

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the benefits of TTNs, compared to other nonlinear value function approximation algorithms, both for policy evaluation and control.
Researcher Affiliation Academia Wesley Chung, Somjit Nath, Ajin George Joseph and Martha White Department of Computing Science University of Alberta
Pseudocode Yes Algorithm 1 Training of TTNs; Algorithm 2 TD(λ) algorithm; Algorithm 3 GTD2 algorithm
Open Source Code No The paper does not contain any statements or links indicating the release of open-source code for the described methodology.
Open Datasets Yes We use the Open AI gym implementation (Brockman et al., 2016).; In Puck World (Tasfi, 2016)
Dataset Splits No The paper describes how value estimates are evaluated (using 500 states for RMSVE), but it does not specify a training/validation/test split for the overall dataset used in the experiments.
Hardware Specification No The paper discusses computational aspects like O(d2) complexity but does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions "AMSGrad optimizer (Reddi et al., 2018)", "Pygame Learning Environment (Tasfi, 2016)", and "Open AI gym implementation (Brockman et al., 2016)", but it does not specify version numbers for any programming languages, libraries, or other software dependencies.
Experiment Setup Yes To choose hyperparameters, we first did a preliminary sweep on a broad range and then chose a smaller range where the algorithms usually made progress, summarized in Appendix D. Results are reported for hyperparameters in the refined range, chosen based on RMSVE over the latter half of a run with shaded regions corresponding to one standard error.