reproducibilityindex.ai

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

Authors: Eloïse Berthier, Ziad Kobeissi, Francis Bach

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate this convergence numerically on a simple continuous-state Markov reward process. [...] We run TD on functions rabs and rcos, with kernels K1 and K2. We use parameters λ and and exponential averaging as prescribed in Thm. 1 (b). Each experiment is repeated 10 times and we record the mean one standard deviation. The implementation is based on a ﬁnite dimensional representation of the iterates (Vk)k n in Rn (see further details in App. B.2). This implies computing the kernel matrix in O(n2) operations. To accelerate this computation when the eigenvalues decrease fast, we approximate it with the incomplete Cholesky decomposition [3]. In Tab. 1, we set " = 0.8, γ = 0.5 and report the observed convergence rates v.s. the ones expected by Thm. 2, which are fairly consistent. In Fig. 1, we show the respective effects of varying " and γ. Larger values of " or γ make the problem more difﬁcult and slow down convergence, presumably in the constants without affecting the rates, as predicted by Thm. 2. Additional experiments are provided in App. B.3.
Researcher Affiliation	Academia	Eloïse Berthier Inria & Ecole Normale Supérieure PSL Research University, Paris, France eloise.berthier@inria.fr Ziad Kobeissi Institut Louis Bachelier Inria Paris ziad.kobeissi@inria.fr Francis Bach Inria & Ecole Normale Supérieure PSL Research University, Paris, France francis.bach@inria.fr
Pseudocode	No	The paper provides mathematical formulations and update rules for the algorithm (e.g., equations 3, 4, 21, 24) but does not include a distinct pseudocode block or algorithm listing.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code and the instructions to run a minimal experiment are provided in the supplementary material.
Open Datasets	No	The paper uses a 'toy model' and 'artiﬁcial data' (e.g., functions rabs and rcos on X = [0,1]) for its experiments, which is custom-generated. No specific public dataset with access information (link, DOI, citation for data source) is used or provided for external use.
Dataset Splits	No	The paper does not provide explicit training, validation, or test dataset splits. It describes a 'toy model' and runs simulations for convergence analysis, rather than training models on pre-defined data splits.
Hardware Specification	Yes	The numerical experiments were run on a personal computer (Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 16GB RAM).
Software Dependencies	No	The implementation relies heavily on the numpy and scipy libraries. However, specific version numbers for these libraries are not provided.
Experiment Setup	Yes	In Tab. 1, we set " = 0.8, γ = 0.5 and report the observed convergence rates v.s. the ones expected by Thm. 2. [...] For the i.i.d. case (Thm. 1), we use the kernel K2, " = 0.8, γ = 0.5. We choose λ and from the optimal values shown in the theorem: λ = λ0n^(-1/(3+ε)) and η = log n/(2λn), where we used λ0 = 0.1, ε = 0.1. [...] For the Markovian case (Thm. 2), we use the kernel K2, " = 0.8, γ = 0.5. For scenario (i), we set λ = λ0n^(-1/(2+ε)), η = log n/(2λn) and λ0 = 0.1, ε = 0.1. For scenario (ii), we set λ = λ0n^(-1/(4+ε)), η = log n/(2λn) and λ0 = 0.1, ε = 0.1. The number of iterations is n = 10^5.