Finite Sample Analyses for TD(0) With Function Approximation

Authors: Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our work is the first to provide such results. Works that managed to obtain convergence rates for online Temporal Difference (TD) methods analyzed somewhat modified versions of them that include projections and stepsize dependent on unknown problem parameters. Our analysis obviates these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. Both are based on relatively unknown, recently developed stochastic approximation techniques. Main Results Our first main result is a bound on the expected decay rate of the TD(0) iterates. It requires the following assumption.
Researcher Affiliation Academia gald@campus.technion.ac.il Bal azs Sz or enyi szorenyi.balazs@gmail.com Gugan Thoppe gugan.thoppe@gmail.com Shie Mannor shie@ee.technion.ac.il
Pseudocode No The TD(0) update rule is presented as an equation: θn+1 = θn + αn[rn + γφ n θn φ n θn]φn, (1), but there is no pseudocode or algorithm block.
Open Source Code No The paper is theoretical and does not mention making any source code for its methodology publicly available.
Open Datasets No The paper describes a theoretical setup with 'iid samples' (Let {(φn, φ n, rn)}n be iid samples of (φ, φ , r)), which is a theoretical assumption, not a reference to a public dataset.
Dataset Splits No The paper is purely theoretical and does not describe any experimental dataset splits for training, validation, or testing.
Hardware Specification No The paper mentions 'Rudimentary simulations' but provides no specific hardware details used for these or any other computations.
Software Dependencies No No specific software names with version numbers are mentioned that would be required to replicate any part of the work.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.