On the Rate of Convergence and Error Bounds for LSTD(λ)

Authors: Manel Tagorti, Bruno Scherrer

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Under a β-mixing assumption, we derive, for any value of λ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.
Researcher Affiliation Academia Manel Tagorti MANEL.TAGORTI@INRIA.FR Bruno Scherrer BRUNO.SCHERRER@INRIA.FR Inria, Villers-l es-Nancy, F-54600, France Universit e de Lorraine, LORIA, UMR 7503, Vandœuvre-l es-Nancy, F-54506, France
Pseudocode No The paper describes the LSTD(λ) algorithm in text, stating 'The LSTD(λ) algorithm that is the focus of this article is now precisely described.' However, it does not provide any structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No This is a theoretical paper focused on deriving bounds and proofs. It does not involve empirical studies with datasets, and thus no training data is mentioned as being publicly available.
Dataset Splits No This is a theoretical paper focused on deriving bounds and proofs. It does not involve empirical studies with datasets, and thus no validation dataset splits are mentioned.
Hardware Specification No This is a theoretical paper focused on deriving bounds and proofs. It does not involve running experiments and therefore does not specify any hardware used.
Software Dependencies No This is a theoretical paper focused on deriving bounds and proofs. It does not involve running experiments and therefore does not list any specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper focused on deriving bounds and proofs. It does not involve running experiments and therefore does not describe any experimental setup details such as hyperparameters or training settings.