Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes

Authors: Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we consider an SA involving a contraction mapping with respect to an arbitrary norm, and show its finite-sample error bounds while using different stepsizes. The idea is to construct a smooth Lyapunov function using the generalized Moreau envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. Our result is applicable in Reinforcement Learning (RL). In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning [18].
Researcher Affiliation Collaboration Zaiwei Chen Georgia Institute of Technology zchen458@gatec.edu Siva Theja Maguluri Georgia Institute of Technology siva.theja@gatech.edu Sanjay Shakkottai The University of Texas at Austin sanjay.shakkottai@utexas.edu Karthikeyan Shanmugam IBM Research NY Karthikeyan.Shanmugam2@ibm.com
Pseudocode No The paper describes algorithms using mathematical formulations (e.g., Equation 2 for SA, Equation 5 for V-trace) but does not provide structured pseudocode blocks or algorithms.
Open Source Code No The paper does not provide any statements or links indicating that source code for the described methodology is being released or is publicly available.
Open Datasets No The paper is theoretical and does not involve empirical training on a specific, publicly available dataset in the context of experiments. It discusses theoretical models like MDPs.
Dataset Splits No The paper is theoretical and does not involve empirical validation or dataset splits for reproducibility.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not list any specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.