Finite-Sample Analysis for SARSA with Linear Function Approximation
Authors: Shaofeng Zou, Tengyu Xu, Yingbin Liang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of analysis, we provide the finite-sample analysis on the mean square error of the SARSA algorithm. We then further study a fitted SARSA algorithm, which includes the original SARSA algorithm and its variant in [28] as special cases. This fitted SARSA algorithm provides a more general framework for iterative on-policy fitted policy iteration, which is more memory and computationally efficient. For this fitted SARSA algorithm, we also provide its finite-sample analysis. |
| Researcher Affiliation | Academia | Shaofeng Zou Department of Electrical Engineering University at Buffalo, The State University of New York Buffalo, NY 14228 szou3@buffalo.edu Tengyu Xu Department of ECE The Ohio State University Columbus, OH 43210 xu.3260@osu.edu Yingbin Liang Department of ECE The Ohio State University Columbus, OH 43210 liang.889@osu.edu |
| Pseudocode | Yes | Algorithm 1 SARSA |
| Open Source Code | No | The paper does not provide any explicit statements or links to open-source code. |
| Open Datasets | No | The paper focuses on theoretical analysis and does not involve empirical training on specific datasets. It discusses 'single sample trajectory' but not a publicly available dataset for training. |
| Dataset Splits | No | The paper focuses on theoretical analysis and does not involve empirical validation with dataset splits. |
| Hardware Specification | No | This paper is theoretical and does not describe any experiments that would require specific hardware specifications. |
| Software Dependencies | No | This paper is theoretical and does not describe any experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | This paper is theoretical and does not describe any experimental setups, hyperparameters, or training configurations. |