Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Neural Temporal-Difference Learning Converges to Global Optima
Authors: Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we prove for the ο¬rst time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In detail, we prove that randomly initialized neural TD converges to the global optimum of MSPBE at the rate of 1/T with population semigradients and at the rate of 1/ T with stochastic semigradients. |
| Researcher Affiliation | Academia | Department of Industrial Engineering and Management Sciences, Northwestern University Department of Operations Research and Financial Engineering, Princeton University Department of Electronic Engineering, Princeton University |
| Pseudocode | Yes | Algorithm 1 Neural TD |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on proving convergence, not on empirical evaluation with specific datasets. It discusses theoretical concepts like 'stationary distribution of policy' but does not mention any publicly available datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no training, validation, or test dataset splits mentioned. |
| Hardware Specification | No | The paper is purely theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup. Therefore, it does not list software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an empirical experimental setup, including hyperparameters or system-level training settings. |