Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation
Authors: Yue Wang, Shaofeng Zou, Yi Zhou
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we develop novel techniques to address the above challenges and explicitly characterize the non-asymptotic error bound for the general off-policy setting with i.i.d. or Markovian samples, and show that it converges as fast as O(1/ T) (up to a factor of O(log T)). Our approach can be applied to a wide range of value-based reinforcement learning algorithms with general smooth function approximation. |
| Researcher Affiliation | Academia | Yue Wang Department of Electrical Engineering University at Buffalo Buffalo, NY, USA EMAIL Shaofeng Zou Department of Electrical Engineering University at Buffalo Buffalo, NY, USA EMAIL Yi Zhou Department of Electrical and Computer Engineering University of Utah Salt Lake City, Utah, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Non-Linear Off-Policy TDC under the Markovian Setting |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and analyzes algorithms with samples generated from a Markov Decision Process, but it does not specify or provide access information for any publicly available or open datasets used for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on specific datasets, therefore, it does not provide training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any ancillary software or library versions used for experiments. |
| Experiment Setup | No | The paper is theoretical and analyzes an algorithm but does not provide details on hyperparameter values or system-level training settings, as it does not conduct empirical experiments. |