Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
Authors: Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that, utilizing an overparameterized two-layer neural network, temporaldifference and Q-learning globally minimize the mean-squared projected Bellman error at a sublinear rate. Moreover, the associated feature representation converges to the optimal one, generalizing the previous analysis of [21] in the neural tangent kernel regime, where the associated feature representation stabilizes at the initial one. The key to our analysis is a mean-field perspective |
| Researcher Affiliation | Academia | Yufeng Zhang Northwestern University Evanston, IL 60208 yufengzhang2023@u.northwestern.edu Qi Cai Northwestern University Evanston, IL 60208 qicai2022@u.northwestern.edu Zhuoran Yang Princeton University Princeton, NJ 08544 zy6@princeton.edu Yongxin Chen Georgia Institute of Technology Atlanta, GA 30332 yongchen@gatech.edu Zhaoran Wang Northwestern University Evanston, IL 60208 zhaoranwang@gmail.com |
| Pseudocode | Yes | For an initial distribution 0 2 P(RD), we initialize { i}m i.i.d. 0 (i 2 [m]). See Algorithm 1 in A for a detailed description. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments or use datasets, thus no information about public dataset availability is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with datasets, thus no information about training/validation/test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe an experimental setup that would involve software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations. |