reproducibilityindex.ai

Can Temporal-Diﬀerence and Q-Learning Learn Representation? A Mean-Field Theory

Authors: Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that, utilizing an overparameterized two-layer neural network, temporaldifference and Q-learning globally minimize the mean-squared projected Bellman error at a sublinear rate. Moreover, the associated feature representation converges to the optimal one, generalizing the previous analysis of [21] in the neural tangent kernel regime, where the associated feature representation stabilizes at the initial one. The key to our analysis is a mean-ﬁeld perspective
Researcher Affiliation	Academia	Yufeng Zhang Northwestern University Evanston, IL 60208 yufengzhang2023@u.northwestern.edu Qi Cai Northwestern University Evanston, IL 60208 qicai2022@u.northwestern.edu Zhuoran Yang Princeton University Princeton, NJ 08544 zy6@princeton.edu Yongxin Chen Georgia Institute of Technology Atlanta, GA 30332 yongchen@gatech.edu Zhaoran Wang Northwestern University Evanston, IL 60208 zhaoranwang@gmail.com
Pseudocode	Yes	For an initial distribution 0 2 P(RD), we initialize { i}m i.i.d. 0 (i 2 [m]). See Algorithm 1 in A for a detailed description.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not conduct experiments or use datasets, thus no information about public dataset availability is provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments with datasets, thus no information about training/validation/test splits is provided.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not describe an experimental setup that would involve software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations.