Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Authors: Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1. Convergence of Algorithm 2.2 on a 5 state, 3 action MDP. Each dashed line is for one sample path of the algorithm, and the solid line is the average of the 4 sample paths. See Appendix D for more details.
Researcher Affiliation Academia 1School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA 2Ph D Program in Machine Learning, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
Pseudocode Yes Algorithm 2.1 Q-Trace; Algorithm 2.2 Off-Policy Natural Actor-Critic
Open Source Code No The paper does not provide any specific statements or links regarding the availability of its source code.
Open Datasets No The paper mentions 'Convergence of Algorithm 2.2 on a 5 state, 3 action MDP' and 'single trajectory of samples' but does not specify a publicly available dataset or provide access information for the environment/data used in the empirical demonstration.
Dataset Splits No The paper discusses the theoretical aspects of sampling and convergence but does not provide specific details on training, validation, or test dataset splits for any empirical evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware used to conduct its experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup No While Algorithm 2.2 lists input parameters (T, K, α, β, Q0, π0, ρ, c), specific hyperparameter values or a detailed experimental setup for the empirical results shown in Figure 1 are not provided in the main text. Appendix D is referenced but not available in the provided text.