Two-Timescale Networks for Nonlinear Value Function Approximation
Authors: Wesley Chung, Somjit Nath, Ajin Joseph, Martha White
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the benefits of TTNs, compared to other nonlinear value function approximation algorithms, both for policy evaluation and control. |
| Researcher Affiliation | Academia | Wesley Chung, Somjit Nath, Ajin George Joseph and Martha White Department of Computing Science University of Alberta |
| Pseudocode | Yes | Algorithm 1 Training of TTNs; Algorithm 2 TD(λ) algorithm; Algorithm 3 GTD2 algorithm |
| Open Source Code | No | The paper does not contain any statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | We use the Open AI gym implementation (Brockman et al., 2016).; In Puck World (Tasfi, 2016) |
| Dataset Splits | No | The paper describes how value estimates are evaluated (using 500 states for RMSVE), but it does not specify a training/validation/test split for the overall dataset used in the experiments. |
| Hardware Specification | No | The paper discusses computational aspects like O(d2) complexity but does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "AMSGrad optimizer (Reddi et al., 2018)", "Pygame Learning Environment (Tasfi, 2016)", and "Open AI gym implementation (Brockman et al., 2016)", but it does not specify version numbers for any programming languages, libraries, or other software dependencies. |
| Experiment Setup | Yes | To choose hyperparameters, we first did a preliminary sweep on a broad range and then chose a smaller range where the algorithms usually made progress, summarized in Appendix D. Results are reported for hyperparameters in the refined range, chosen based on RMSVE over the latter half of a run with shaded regions corresponding to one standard error. |