reproducibilityindex.ai

Neural Temporal-Difference Learning Converges to Global Optima

Authors: Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we prove for the ﬁrst time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In detail, we prove that randomly initialized neural TD converges to the global optimum of MSPBE at the rate of 1/T with population semigradients and at the rate of 1/ T with stochastic semigradients.
Researcher Affiliation	Academia	Department of Industrial Engineering and Management Sciences, Northwestern University Department of Operations Research and Financial Engineering, Princeton University Department of Electronic Engineering, Princeton University
Pseudocode	Yes	Algorithm 1 Neural TD
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on proving convergence, not on empirical evaluation with specific datasets. It discusses theoretical concepts like 'stationary distribution of policy' but does not mention any publicly available datasets for training.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets. Therefore, there are no training, validation, or test dataset splits mentioned.
Hardware Specification	No	The paper is purely theoretical and does not describe any experimental setup, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup. Therefore, it does not list software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an empirical experimental setup, including hyperparameters or system-level training settings.