reproducibilityindex.ai

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Authors: Hang Wang, Sen Lin, Junshan Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider experiments over the Gridworld benchmark task. In particular, we consider the following sizes of the grid to represent different problem complexity, i.e., 10 10, 15 15 and 20 20.
Researcher Affiliation	Academia	1Department of ECE, University of California, Davis, CA, USA 2Department of ECE, The Ohio State University, Columbus, OH, USA.
Pseudocode	No	No explicit pseudocode or algorithm block was found. The methods are described using mathematical equations and prose.
Open Source Code	No	No statement regarding the release of open-source code or a link to a code repository was found.
Open Datasets	No	We consider experiments over the Gridworld benchmark task. In particular, we consider the following sizes of the grid to represent different problem complexity, i.e., 10 10, 15 15 and 20 20. - While the paper mentions a benchmark task, it does not provide specific access information (link, DOI, formal citation with author/year, or specific file names) for the dataset used for training.
Dataset Splits	No	No explicit information on training/validation/test splits (e.g., percentages, sample counts, or references to predefined splits) was found.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers were mentioned.
Experiment Setup	Yes	The discounting factor is set as γ = 0.9. We consider the grid with 10 rows and 10 columns such that the state space has 100 states. ... we let m be large enough, e.g., m = 1000, in the Critic update Eqn. (28). ... we study the Critic update with finite time Bellman evaluation, e.g., m = 500, 50, 20, 5. ... we add the uniform noise e(t) in the value function with different bias, e.g.,E[e(t)] = 0, 0.5, 1, 1. ... with probability p, the agent will choose the action follow the current policy while with probability 1 p, the agent will choose a random action. By setting different p, we show in Fig. 7 that the approximation error in the Actor update may significantly degrade the learning performance.