Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis
Authors: Shaocong Ma, Yi Zhou, Shaofeng Zou
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller asymptotic convergence error than both the conventional TDC and the variance-reduced TD. |
| Researcher Affiliation | Academia | Shaocong Ma Department of ECE University of Utah Salt Lake City, UT 84112 s.ma@utah.edu; Yi Zhou Department of ECE University of Utah Salt Lake City, UT 84112 yi.zhou@utah.edu; Shaofeng Zou Department of EE University at Buffalo Buffalo, NY 14260 szou3@buffalo.edu |
| Pseudocode | Yes | Algorithm 1: Variance-Reduced TDC for I.I.D. Samples; Algorithm 2: TDC with Variance Reduction for Markovian Samples |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of its source code. |
| Open Datasets | Yes | We first consider the Garnet problem [1, 29]... Our second experiment considers the frozen lake game in the Open AI Gym [5]. |
| Dataset Splits | No | The paper describes using multiple trajectories for experiments and measuring convergence error, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) or mention cross-validation for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers required to replicate the experiments. |
| Experiment Setup | Yes | We set the learning rate α = 0.1 for all the four algorithms, and set the other learning rate β = 0.02 for both VRTDC and TDC. For VRTDC and VRTD, we set the batch size M = 3000. |