Statistical Efficiency of Distributional Temporal Difference Learning
Authors: Yang Peng, Liangyu Zhang, Zhihua Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we go a step further and analyze the finite-sample performance of distributional TD. To facilitate theoretical analysis, we propose non-parametric distributional temporal difference learning (NTD). For a γ-discounted infinite-horizon tabular Markov decision process, we show that for NTD we need e O 1 ε2p(1 γ)2p+1 iterations to achieve an ε-optimal estimator with high probability, when the estimation error is measured by the p-Wasserstein distance. This sample complexity bound is minimax optimal up to logarithmic factors in the case of the 1-Wasserstein distance. To achieve this, we establish a novel Freedman s inequality in Hilbert spaces, which would be of independent interest. In addition, we revisit CTD, showing that the same non-asymptotic convergence bounds hold for CTD in the case of the p-Wasserstein distance for p 1. |
| Researcher Affiliation | Academia | School of Mathematical Sciences, Peking University; email: pengyang@pku.edu.cn. School of Statistics and Management, Shanghai University of Finance and Economics; email: zhangliangyu@sufe.edu.cn. School of Mathematical Sciences, Peking University; email: zhzhang@math.pku.edu.cn. |
| Pseudocode | No | The paper describes updating schemes for NTD and CTD using mathematical formulas and text, but does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not provide concrete access to source code. The NeurIPS checklist, section 5, explicitly states 'NA' for this question, indicating no code release. |
| Open Datasets | No | This paper is theoretical and does not conduct experiments with datasets, therefore, there is no mention of training data. |
| Dataset Splits | No | This paper is theoretical and does not conduct experiments with datasets, therefore, there is no mention of validation data. |
| Hardware Specification | No | This paper is theoretical and does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | This paper is theoretical and does not involve experimental implementation that would require specific software dependencies with version numbers. |
| Experiment Setup | No | This paper is theoretical and does not include experimental setups, hyperparameters, or training configurations. Its focus is on mathematical analysis of algorithms. |