Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Sample Complexity for Average Reward Markov Decision Processes
Authors: Shengbo Wang, Jose Blanchet, Peter Glynn
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, we conduct numerical experiments to validate our theoretical findings. |
| Researcher Affiliation | Academia | Shengbo Wang, Jose Blanchet & Peter Glynn Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Perturbed Model-based Planning (Li et al., 2020): PMBP(γ, ζ, n)... Algorithm 2 Reduction and Perturbed Model-based Planning |
| Open Source Code | No | The paper does not provide an unambiguous statement of releasing its source code, nor does it include a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | The family of reward functions and transition kernels used for both experiments belongs to the family of hard instances constructed in Wang et al. (2023). |
| Dataset Splits | No | The paper mentions running numerical experiments with a family of hard instances from Wang et al. (2023) and conducting 300 replications, but does not specify explicit train, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not specify version numbers for any key software components, libraries, or solvers used in the research. |
| Experiment Setup | Yes | Input: Discount factor γ (0, 1). Perturbation amplitude ζ > 0. Sample size n 1. (Algorithm 1) ... Input: Error tolerance ϵ (0, 1]. Assign γ = 1 ϵ 19tminorize , ζ = 1 4(1 γ)tminorize, and n = cβδ(η δ) (1 γ)2tminorize where c = 4 4862. (Algorithm 2) ... The experiments in Figure 1b use C = 4500 for the purple line and C = 18000 for the blue line. |