Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Two-Timescale Networks for Nonlinear Value Function Approximation
Authors: Wesley Chung, Somjit Nath, Ajin Joseph, Martha White
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the benefits of TTNs, compared to other nonlinear value function approximation algorithms, both for policy evaluation and control. |
| Researcher Affiliation | Academia | Wesley Chung, Somjit Nath, Ajin George Joseph and Martha White Department of Computing Science University of Alberta |
| Pseudocode | Yes | Algorithm 1 Training of TTNs; Algorithm 2 TD(λ) algorithm; Algorithm 3 GTD2 algorithm |
| Open Source Code | No | The paper does not contain any statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | We use the Open AI gym implementation (Brockman et al., 2016).; In Puck World (Tasfi, 2016) |
| Dataset Splits | No | The paper describes how value estimates are evaluated (using 500 states for RMSVE), but it does not specify a training/validation/test split for the overall dataset used in the experiments. |
| Hardware Specification | No | The paper discusses computational aspects like O(d2) complexity but does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "AMSGrad optimizer (Reddi et al., 2018)", "Pygame Learning Environment (Tasfi, 2016)", and "Open AI gym implementation (Brockman et al., 2016)", but it does not specify version numbers for any programming languages, libraries, or other software dependencies. |
| Experiment Setup | Yes | To choose hyperparameters, we first did a preliminary sweep on a broad range and then chose a smaller range where the algorithms usually made progress, summarized in Appendix D. Results are reported for hyperparameters in the refined range, chosen based on RMSVE over the latter half of a run with shaded regions corresponding to one standard error. |