Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Temporal Difference Learning as Gradient Splitting
Authors: Rui Liu, Alex Olshevsky
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main contribution is to provide an interpretation of temporal difference learning: we show how to view it as a splitting (a term we will deο¬ne later) of an appropriately chosen quadratic form. As a consequence of this interpretation, it is possible to apply convergence proofs for gradient descent almost verbatim to temporal difference learning.The convergence times bounds we obtain this way improve on existing results. |
| Researcher Affiliation | Academia | 1Division of Systems Engineering, Boston University, Boston, MA, USA 2Department of ECE and Division of Systems Engineering, Boston University, Boston, MA, USA. |
| Pseudocode | Yes | Algorithm 1 Mean-adjusted TD(0) |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve the use of specific datasets for training or experimentation. It discusses theoretical properties of Markov chains. |
| Dataset Splits | No | The paper is theoretical and does not describe training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup, hyperparameters, or training configurations. |