Finite Sample Analyses for TD(0) With Function Approximation
Authors: Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our work is the first to provide such results. Works that managed to obtain convergence rates for online Temporal Difference (TD) methods analyzed somewhat modified versions of them that include projections and stepsize dependent on unknown problem parameters. Our analysis obviates these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. Both are based on relatively unknown, recently developed stochastic approximation techniques. Main Results Our first main result is a bound on the expected decay rate of the TD(0) iterates. It requires the following assumption. |
| Researcher Affiliation | Academia | gald@campus.technion.ac.il Bal azs Sz or enyi szorenyi.balazs@gmail.com Gugan Thoppe gugan.thoppe@gmail.com Shie Mannor shie@ee.technion.ac.il |
| Pseudocode | No | The TD(0) update rule is presented as an equation: θn+1 = θn + αn[rn + γφ n θn φ n θn]φn, (1), but there is no pseudocode or algorithm block. |
| Open Source Code | No | The paper is theoretical and does not mention making any source code for its methodology publicly available. |
| Open Datasets | No | The paper describes a theoretical setup with 'iid samples' (Let {(φn, φ n, rn)}n be iid samples of (φ, φ , r)), which is a theoretical assumption, not a reference to a public dataset. |
| Dataset Splits | No | The paper is purely theoretical and does not describe any experimental dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper mentions 'Rudimentary simulations' but provides no specific hardware details used for these or any other computations. |
| Software Dependencies | No | No specific software names with version numbers are mentioned that would be required to replicate any part of the work. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations. |