Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Statistical field theory for Markov decision processes under uncertainty
Authors: George Stamatescu
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The paper presents two such methods, corresponding to two distinct asymptotic limits. First, the classical approximation is applied, corresponding to the asymptotic data limit. This approximation recovers so-called plug-in estimators for the mean of the value functions. Second, a dynamic mean field theory is derived, showing that under certain assumptions the state-action values are statistically independent across state-action pairs in the asymptotic state space limit. ... We present numerical simulations in support of the results of the dynamic mean field theory, in particular that the mean field equations provide the mean of the value functions or state-action value functions and that the variance of the state action value functions reduces to that of the mean-reward in the large state space limit. The simulations in Figure 1. show the accuracy of the DMFP equation for a relatively small MDP, as described. The simulations in Figure 2. show the convergence to the DMFP result depends generally on the effective horizon 1 1 β in the discounted setting, and correspondingly the horizon T in the finite horizon MDP. |
| Researcher Affiliation | Academia | George Stamatescu EMAIL University of Adelaide Adelaide, SA, Australia |
| Pseudocode | No | The paper describes mathematical formalisms and equations but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the availability of source code, nor does it provide any links to a code repository or mention code in supplementary materials. |
| Open Datasets | No | The paper describes theoretical formalisms and numerical simulations based on model parameters such as "flat Dirichlet prior and Gaussian mean-reward posterior, with uniformly random mean and a variance of 1 for each state-action". It does not refer to any publicly available datasets. |
| Dataset Splits | No | The paper does not use external datasets, but rather parameters for numerical simulations. Therefore, there is no mention of dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper describes theoretical work and numerical simulations but does not specify the hardware (e.g., CPU, GPU models, or cloud resources) used for these simulations. |
| Software Dependencies | No | The paper describes theoretical concepts and numerical simulations but does not list any specific software or library dependencies with version numbers. |
| Experiment Setup | Yes | Dynamic mean field programming theory versus simulations of the Bayesian mean and variance of the iterates of a particular Q-value function for an infinite horizon MDP with N = 50 states, with |A| = 2, discount factor β = 0.9, with the empirical estimates formed from 500 realisations of the system. The model has a flat Dirichlet prior and Gaussian mean-reward posterior, with uniformly random mean and a variance of 1 for each state-action. ... The empirical variance for the same system as presented in Figure 1, but with variable discount factor β = {0.8, 0.85, 0.9, 0.95}. |