reproducibilityindex.ai

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Authors: Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1. Convergence of Algorithm 2.2 on a 5 state, 3 action MDP. Each dashed line is for one sample path of the algorithm, and the solid line is the average of the 4 sample paths. See Appendix D for more details.
Researcher Affiliation	Academia	1School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA 2Ph D Program in Machine Learning, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
Pseudocode	Yes	Algorithm 2.1 Q-Trace; Algorithm 2.2 Off-Policy Natural Actor-Critic
Open Source Code	No	The paper does not provide any specific statements or links regarding the availability of its source code.
Open Datasets	No	The paper mentions 'Convergence of Algorithm 2.2 on a 5 state, 3 action MDP' and 'single trajectory of samples' but does not specify a publicly available dataset or provide access information for the environment/data used in the empirical demonstration.
Dataset Splits	No	The paper discusses the theoretical aspects of sampling and convergence but does not provide specific details on training, validation, or test dataset splits for any empirical evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to conduct its experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	No	While Algorithm 2.2 lists input parameters (T, K, α, β, Q0, π0, ρ, c), specific hyperparameter values or a detailed experimental setup for the empirical results shown in Figure 1 are not provided in the main text. Appendix D is referenced but not available in the provided text.