Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Authors: Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1. Convergence of Algorithm 2.2 on a 5 state, 3 action MDP. Each dashed line is for one sample path of the algorithm, and the solid line is the average of the 4 sample paths. See Appendix D for more details. |
| Researcher Affiliation | Academia | 1School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA 2Ph D Program in Machine Learning, Georgia Institute of Technology, Atlanta, GA, 30332, USA. |
| Pseudocode | Yes | Algorithm 2.1 Q-Trace; Algorithm 2.2 Off-Policy Natural Actor-Critic |
| Open Source Code | No | The paper does not provide any specific statements or links regarding the availability of its source code. |
| Open Datasets | No | The paper mentions 'Convergence of Algorithm 2.2 on a 5 state, 3 action MDP' and 'single trajectory of samples' but does not specify a publicly available dataset or provide access information for the environment/data used in the empirical demonstration. |
| Dataset Splits | No | The paper discusses the theoretical aspects of sampling and convergence but does not provide specific details on training, validation, or test dataset splits for any empirical evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to conduct its experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | While Algorithm 2.2 lists input parameters (T, K, α, β, Q0, π0, ρ, c), specific hyperparameter values or a detailed experimental setup for the empirical results shown in Figure 1 are not provided in the main text. Appendix D is referenced but not available in the provided text. |