Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation
Authors: Yue Wang, Shaofeng Zou, Yi Zhou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we develop novel techniques to address the above challenges and explicitly characterize the non-asymptotic error bound for the general off-policy setting with i.i.d. or Markovian samples, and show that it converges as fast as O(1/ T) (up to a factor of O(log T)). Our approach can be applied to a wide range of value-based reinforcement learning algorithms with general smooth function approximation. |
| Researcher Affiliation | Academia | Yue Wang Department of Electrical Engineering University at Buffalo Buffalo, NY, USA ywang294@buffalo.edu Shaofeng Zou Department of Electrical Engineering University at Buffalo Buffalo, NY, USA szou3@buffalo.edu Yi Zhou Department of Electrical and Computer Engineering University of Utah Salt Lake City, Utah, USA yi.zhou@utah.edu |
| Pseudocode | Yes | Algorithm 1 Non-Linear Off-Policy TDC under the Markovian Setting |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and analyzes algorithms with samples generated from a Markov Decision Process, but it does not specify or provide access information for any publicly available or open datasets used for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on specific datasets, therefore, it does not provide training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any ancillary software or library versions used for experiments. |
| Experiment Setup | No | The paper is theoretical and analyzes an algorithm but does not provide details on hyperparameter values or system-level training settings, as it does not conduct empirical experiments. |