Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the Convergence of Continuous Single-timescale Actor-critic
Authors: Xuyang Chen, Lin Zhao
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish finite-time convergence by introducing a novel Lyapunov analysis framework, which provides a unified convergence characterization of both the actor and the critic. Our approach is less conservative than previous methods and offers new insights into the coupled dynamics of actor-critic updates. ... In this paper, we provide a finite-time convergence analysis for the single-sample, single-timescale actor-critic algorithm in continuous state-action spaces. We propose a novel Lyapunov analysis framework, which allows a less conservative analysis under the same set of assumptions adopted in existing studies. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, National University of Singapore, Singapore. Correspondence to: Lin Zhao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Continuous Single-sample Single-timescale Actor-Critic with Markovian Sampling |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical, focusing on convergence analysis of actor-critic methods in continuous state-action spaces. It does not mention or utilize any specific datasets for empirical evaluation, thus no information about open datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve experiments on specific datasets, therefore, there are no mentions of dataset splits (e.g., training/test/validation splits). |
| Hardware Specification | No | The paper is a theoretical work on convergence analysis and does not describe any experimental setup or the hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies or versions used for implementation. |
| Experiment Setup | No | The paper focuses on theoretical convergence analysis of an algorithm and does not provide details of an experimental setup, such as hyperparameters or training configurations. |