Neural Temporal-Difference Learning Converges to Global Optima
Authors: Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In detail, we prove that randomly initialized neural TD converges to the global optimum of MSPBE at the rate of 1/T with population semigradients and at the rate of 1/ T with stochastic semigradients. |
| Researcher Affiliation | Academia | Department of Industrial Engineering and Management Sciences, Northwestern University Department of Operations Research and Financial Engineering, Princeton University Department of Electronic Engineering, Princeton University |
| Pseudocode | Yes | Algorithm 1 Neural TD |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on proving convergence, not on empirical evaluation with specific datasets. It discusses theoretical concepts like 'stationary distribution of policy' but does not mention any publicly available datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no training, validation, or test dataset splits mentioned. |
| Hardware Specification | No | The paper is purely theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup. Therefore, it does not list software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an empirical experimental setup, including hyperparameters or system-level training settings. |