Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Statistical field theory for Markov decision processes under uncertainty

Authors: George Stamatescu

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The paper presents two such methods, corresponding to two distinct asymptotic limits. First, the classical approximation is applied, corresponding to the asymptotic data limit. This approximation recovers so-called plug-in estimators for the mean of the value functions. Second, a dynamic mean ﬁeld theory is derived, showing that under certain assumptions the state-action values are statistically independent across state-action pairs in the asymptotic state space limit. ... We present numerical simulations in support of the results of the dynamic mean ﬁeld theory, in particular that the mean ﬁeld equations provide the mean of the value functions or state-action value functions and that the variance of the state action value functions reduces to that of the mean-reward in the large state space limit. The simulations in Figure 1. show the accuracy of the DMFP equation for a relatively small MDP, as described. The simulations in Figure 2. show the convergence to the DMFP result depends generally on the eﬀective horizon 1 1 β in the discounted setting, and correspondingly the horizon T in the ﬁnite horizon MDP.
Researcher Affiliation	Academia	George Stamatescu EMAIL University of Adelaide Adelaide, SA, Australia
Pseudocode	No	The paper describes mathematical formalisms and equations but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about the availability of source code, nor does it provide any links to a code repository or mention code in supplementary materials.
Open Datasets	No	The paper describes theoretical formalisms and numerical simulations based on model parameters such as "flat Dirichlet prior and Gaussian mean-reward posterior, with uniformly random mean and a variance of 1 for each state-action". It does not refer to any publicly available datasets.
Dataset Splits	No	The paper does not use external datasets, but rather parameters for numerical simulations. Therefore, there is no mention of dataset splits for training, validation, or testing.
Hardware Specification	No	The paper describes theoretical work and numerical simulations but does not specify the hardware (e.g., CPU, GPU models, or cloud resources) used for these simulations.
Software Dependencies	No	The paper describes theoretical concepts and numerical simulations but does not list any specific software or library dependencies with version numbers.
Experiment Setup	Yes	Dynamic mean ﬁeld programming theory versus simulations of the Bayesian mean and variance of the iterates of a particular Q-value function for an inﬁnite horizon MDP with N = 50 states, with \|A\| = 2, discount factor β = 0.9, with the empirical estimates formed from 500 realisations of the system. The model has a ﬂat Dirichlet prior and Gaussian mean-reward posterior, with uniformly random mean and a variance of 1 for each state-action. ... The empirical variance for the same system as presented in Figure 1, but with variable discount factor β = {0.8, 0.85, 0.9, 0.95}.