Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Authors: Hang Wang, Sen Lin, Junshan Zhang
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider experiments over the Gridworld benchmark task. In particular, we consider the following sizes of the grid to represent different problem complexity, i.e., 10 10, 15 15 and 20 20. |
| Researcher Affiliation | Academia | 1Department of ECE, University of California, Davis, CA, USA 2Department of ECE, The Ohio State University, Columbus, OH, USA. |
| Pseudocode | No | No explicit pseudocode or algorithm block was found. The methods are described using mathematical equations and prose. |
| Open Source Code | No | No statement regarding the release of open-source code or a link to a code repository was found. |
| Open Datasets | No | We consider experiments over the Gridworld benchmark task. In particular, we consider the following sizes of the grid to represent different problem complexity, i.e., 10 10, 15 15 and 20 20. - While the paper mentions a benchmark task, it does not provide specific access information (link, DOI, formal citation with author/year, or specific file names) for the dataset used for training. |
| Dataset Splits | No | No explicit information on training/validation/test splits (e.g., percentages, sample counts, or references to predefined splits) was found. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. |
| Experiment Setup | Yes | The discounting factor is set as γ = 0.9. We consider the grid with 10 rows and 10 columns such that the state space has 100 states. ... we let m be large enough, e.g., m = 1000, in the Critic update Eqn. (28). ... we study the Critic update with finite time Bellman evaluation, e.g., m = 500, 50, 20, 5. ... we add the uniform noise e(t) in the value function with different bias, e.g.,E[e(t)] = 0, 0.5, 1, 1. ... with probability p, the agent will choose the action follow the current policy while with probability 1 p, the agent will choose a random action. By setting different p, we show in Fig. 7 that the approximation error in the Actor update may significantly degrade the learning performance. |