reproducibilityindex.ai

Finite Sample Analyses for TD(0) With Function Approximation

Authors: Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our work is the ﬁrst to provide such results. Works that managed to obtain convergence rates for online Temporal Difference (TD) methods analyzed somewhat modiﬁed versions of them that include projections and stepsize dependent on unknown problem parameters. Our analysis obviates these artiﬁcial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. Both are based on relatively unknown, recently developed stochastic approximation techniques. Main Results Our ﬁrst main result is a bound on the expected decay rate of the TD(0) iterates. It requires the following assumption.
Researcher Affiliation	Academia	gald@campus.technion.ac.il Bal azs Sz or enyi szorenyi.balazs@gmail.com Gugan Thoppe gugan.thoppe@gmail.com Shie Mannor shie@ee.technion.ac.il
Pseudocode	No	The TD(0) update rule is presented as an equation: θn+1 = θn + αn[rn + γφ n θn φ n θn]φn, (1), but there is no pseudocode or algorithm block.
Open Source Code	No	The paper is theoretical and does not mention making any source code for its methodology publicly available.
Open Datasets	No	The paper describes a theoretical setup with 'iid samples' (Let {(φn, φ n, rn)}n be iid samples of (φ, φ , r)), which is a theoretical assumption, not a reference to a public dataset.
Dataset Splits	No	The paper is purely theoretical and does not describe any experimental dataset splits for training, validation, or testing.
Hardware Specification	No	The paper mentions 'Rudimentary simulations' but provides no specific hardware details used for these or any other computations.
Software Dependencies	No	No specific software names with version numbers are mentioned that would be required to replicate any part of the work.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.