Finite-Sample Analysis for SARSA with Linear Function Approximation

Authors: Shaofeng Zou, Tengyu Xu, Yingbin Liang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of analysis, we provide the finite-sample analysis on the mean square error of the SARSA algorithm. We then further study a fitted SARSA algorithm, which includes the original SARSA algorithm and its variant in [28] as special cases. This fitted SARSA algorithm provides a more general framework for iterative on-policy fitted policy iteration, which is more memory and computationally efficient. For this fitted SARSA algorithm, we also provide its finite-sample analysis.
Researcher Affiliation Academia Shaofeng Zou Department of Electrical Engineering University at Buffalo, The State University of New York Buffalo, NY 14228 szou3@buffalo.edu Tengyu Xu Department of ECE The Ohio State University Columbus, OH 43210 xu.3260@osu.edu Yingbin Liang Department of ECE The Ohio State University Columbus, OH 43210 liang.889@osu.edu
Pseudocode Yes Algorithm 1 SARSA
Open Source Code No The paper does not provide any explicit statements or links to open-source code.
Open Datasets No The paper focuses on theoretical analysis and does not involve empirical training on specific datasets. It discusses 'single sample trajectory' but not a publicly available dataset for training.
Dataset Splits No The paper focuses on theoretical analysis and does not involve empirical validation with dataset splits.
Hardware Specification No This paper is theoretical and does not describe any experiments that would require specific hardware specifications.
Software Dependencies No This paper is theoretical and does not describe any experiments that would require specific software dependencies with version numbers.
Experiment Setup No This paper is theoretical and does not describe any experimental setups, hyperparameters, or training configurations.