Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Authors: Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1. Convergence of Algorithm 2.2 on a 5 state, 3 action MDP. Each dashed line is for one sample path of the algorithm, and the solid line is the average of the 4 sample paths. See Appendix D for more details. |
| Researcher Affiliation | Academia | 1School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA 2Ph D Program in Machine Learning, Georgia Institute of Technology, Atlanta, GA, 30332, USA. |
| Pseudocode | Yes | Algorithm 2.1 Q-Trace; Algorithm 2.2 Off-Policy Natural Actor-Critic |
| Open Source Code | No | The paper does not provide any specific statements or links regarding the availability of its source code. |
| Open Datasets | No | The paper mentions 'Convergence of Algorithm 2.2 on a 5 state, 3 action MDP' and 'single trajectory of samples' but does not specify a publicly available dataset or provide access information for the environment/data used in the empirical demonstration. |
| Dataset Splits | No | The paper discusses the theoretical aspects of sampling and convergence but does not provide specific details on training, validation, or test dataset splits for any empirical evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to conduct its experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | While Algorithm 2.2 lists input parameters (T, K, α, β, Q0, π0, ρ, c), specific hyperparameter values or a detailed experimental setup for the empirical results shown in Figure 1 are not provided in the main text. Appendix D is referenced but not available in the provided text. |