A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
Authors: Eloïse Berthier, Ziad Kobeissi, Francis Bach
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate this convergence numerically on a simple continuous-state Markov reward process. [...] We run TD on functions rabs and rcos, with kernels K1 and K2. We use parameters λ and and exponential averaging as prescribed in Thm. 1 (b). Each experiment is repeated 10 times and we record the mean one standard deviation. The implementation is based on a finite dimensional representation of the iterates (Vk)k n in Rn (see further details in App. B.2). This implies computing the kernel matrix in O(n2) operations. To accelerate this computation when the eigenvalues decrease fast, we approximate it with the incomplete Cholesky decomposition [3]. In Tab. 1, we set " = 0.8, γ = 0.5 and report the observed convergence rates v.s. the ones expected by Thm. 2, which are fairly consistent. In Fig. 1, we show the respective effects of varying " and γ. Larger values of " or γ make the problem more difficult and slow down convergence, presumably in the constants without affecting the rates, as predicted by Thm. 2. Additional experiments are provided in App. B.3. |
| Researcher Affiliation | Academia | Eloïse Berthier Inria & Ecole Normale Supérieure PSL Research University, Paris, France eloise.berthier@inria.fr Ziad Kobeissi Institut Louis Bachelier Inria Paris ziad.kobeissi@inria.fr Francis Bach Inria & Ecole Normale Supérieure PSL Research University, Paris, France francis.bach@inria.fr |
| Pseudocode | No | The paper provides mathematical formulations and update rules for the algorithm (e.g., equations 3, 4, 21, 24) but does not include a distinct pseudocode block or algorithm listing. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code and the instructions to run a minimal experiment are provided in the supplementary material. |
| Open Datasets | No | The paper uses a 'toy model' and 'artificial data' (e.g., functions rabs and rcos on X = [0,1]) for its experiments, which is custom-generated. No specific public dataset with access information (link, DOI, citation for data source) is used or provided for external use. |
| Dataset Splits | No | The paper does not provide explicit training, validation, or test dataset splits. It describes a 'toy model' and runs simulations for convergence analysis, rather than training models on pre-defined data splits. |
| Hardware Specification | Yes | The numerical experiments were run on a personal computer (Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 16GB RAM). |
| Software Dependencies | No | The implementation relies heavily on the numpy and scipy libraries. However, specific version numbers for these libraries are not provided. |
| Experiment Setup | Yes | In Tab. 1, we set " = 0.8, γ = 0.5 and report the observed convergence rates v.s. the ones expected by Thm. 2. [...] For the i.i.d. case (Thm. 1), we use the kernel K2, " = 0.8, γ = 0.5. We choose λ and from the optimal values shown in the theorem: λ = λ0n^(-1/(3+ε)) and η = log n/(2λn), where we used λ0 = 0.1, ε = 0.1. [...] For the Markovian case (Thm. 2), we use the kernel K2, " = 0.8, γ = 0.5. For scenario (i), we set λ = λ0n^(-1/(2+ε)), η = log n/(2λn) and λ0 = 0.1, ε = 0.1. For scenario (ii), we set λ = λ0n^(-1/(4+ε)), η = log n/(2λn) and λ0 = 0.1, ε = 0.1. The number of iterations is n = 10^5. |