reproducibilityindex.ai

Optimal Sample Complexity for Average Reward Markov Decision Processes

Authors: Shengbo Wang, Jose Blanchet, Peter Glynn

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additionally, we conduct numerical experiments to validate our theoretical findings.
Researcher Affiliation	Academia	Shengbo Wang, Jose Blanchet & Peter Glynn Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA {shengbo.wang,jose.blanchet,glynn}@stanford.edu
Pseudocode	Yes	Algorithm 1 Perturbed Model-based Planning (Li et al., 2020): PMBP(γ, ζ, n)... Algorithm 2 Reduction and Perturbed Model-based Planning
Open Source Code	No	The paper does not provide an unambiguous statement of releasing its source code, nor does it include a direct link to a code repository for the methodology described.
Open Datasets	Yes	The family of reward functions and transition kernels used for both experiments belongs to the family of hard instances constructed in Wang et al. (2023).
Dataset Splits	No	The paper mentions running numerical experiments with a family of hard instances from Wang et al. (2023) and conducting 300 replications, but does not specify explicit train, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper does not specify version numbers for any key software components, libraries, or solvers used in the research.
Experiment Setup	Yes	Input: Discount factor γ (0, 1). Perturbation amplitude ζ > 0. Sample size n 1. (Algorithm 1) ... Input: Error tolerance ϵ (0, 1]. Assign γ = 1 ϵ 19tminorize , ζ = 1 4(1 γ)tminorize, and n = cβδ(η δ) (1 γ)2tminorize where c = 4 4862. (Algorithm 2) ... The experiments in Figure 1b use C = 4500 for the purple line and C = 18000 for the blue line.