TD Convergence: An Optimization Perspective
Authors: Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning. |
| Researcher Affiliation | Collaboration | Kavosh Asadi Amazon Shoham Sabach Amazon & Technion Yao Liu Amazon Omer Gottesman Amazon Rasool Fakoor Amazon |
| Pseudocode | Yes | A pseudo-code of both algorithms is presented in Algorithms 1 and 2. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on a specific dataset. It analyzes a classical counter example conceptually rather than through empirical training with data. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets, thus no training, validation, or test splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or the hardware used for computations. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical analysis and convergence proofs, not empirical experiments. Therefore, no experimental setup details such as hyperparameters or training configurations are provided. |