reproducibilityindex.ai

Finite-Sample Analysis for SARSA with Linear Function Approximation

Authors: Shaofeng Zou, Tengyu Xu, Yingbin Liang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of analysis, we provide the finite-sample analysis on the mean square error of the SARSA algorithm. We then further study a fitted SARSA algorithm, which includes the original SARSA algorithm and its variant in [28] as special cases. This fitted SARSA algorithm provides a more general framework for iterative on-policy fitted policy iteration, which is more memory and computationally efficient. For this fitted SARSA algorithm, we also provide its finite-sample analysis.
Researcher Affiliation	Academia	Shaofeng Zou Department of Electrical Engineering University at Buffalo, The State University of New York Buffalo, NY 14228 szou3@buffalo.edu Tengyu Xu Department of ECE The Ohio State University Columbus, OH 43210 xu.3260@osu.edu Yingbin Liang Department of ECE The Ohio State University Columbus, OH 43210 liang.889@osu.edu
Pseudocode	Yes	Algorithm 1 SARSA
Open Source Code	No	The paper does not provide any explicit statements or links to open-source code.
Open Datasets	No	The paper focuses on theoretical analysis and does not involve empirical training on specific datasets. It discusses 'single sample trajectory' but not a publicly available dataset for training.
Dataset Splits	No	The paper focuses on theoretical analysis and does not involve empirical validation with dataset splits.
Hardware Specification	No	This paper is theoretical and does not describe any experiments that would require specific hardware specifications.
Software Dependencies	No	This paper is theoretical and does not describe any experiments that would require specific software dependencies with version numbers.
Experiment Setup	No	This paper is theoretical and does not describe any experimental setups, hyperparameters, or training configurations.