Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Statistical inference for Linear Stochastic Approximation with Markovian Noise

Authors: Sergey Samsonov, Marina Sheshukova, Eric Moulines, Alexey Naumov

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments. We consider the simple instance of the Garnet problem [4, 32]. This problem is characterized by the number of states Ns, number of actions a, and branching factor b. Here b corresponds to the number of states s , that can be reached when starting from a given stateaction pair (s, a). The reward r(s, a) [0, 1] is a deterministic function. We set the hyperparameter values Ns = 6, a = 2, b = 3, feature dimension d = 2, and discount factor λ = 0.8. We aim to evaluate the value function of policy π( |s), which is given, for any a A = {1, 2}, by the expression π(a|s) = U (s) a P|A| i=1 U (s) i , where the U (s) i are i.i.d. observations with U[0, 1]. In this case we can suppose that R(s) = Ea π(a|s)r(s, a) is a random variable which depends on state s and some independent random variable. We consider the problem of policy evaluation in this MDP using the TD learning algorithm with randomly generated feature mapping, that is, we generate the matrix with i.i.d. N(0, 1) entries, and then take ϕ(s), s {1, . . . , |S|}, to its s-th row, normalized by its euclidean norm: ϕ(s) = Φs/ Φs . We run the procedure (1) with the learning rates αk = c0/(k0 + k)γ with γ = 2/3 with appropriately chosen c0 and k0. We generate random vector u form unitary sphere, and compute coverage probabilities for u θ for confidence levels {0.8, 0.9, 0.95}. The detailed setting of the experiments follows [29]. Results are given in Table 1 and illustrates the consistency of multiplier subsample bootstrap procedure applied on the Garnet problem.
Researcher Affiliation Academia Sergey Samsonov HSE University EMAIL Marina Sheshukova HSE University EMAIL Eric Moulines Ecole Polytechnique, MBZUAI EMAIL Alexey Naumov HSE University, Steklov Mathematical Institute of Russian Academy of Sciences EMAIL
Pseudocode No The paper describes its methodology through mathematical equations and prose. It does not include any explicitly labeled 'Pseudocode' or 'Algorithm' sections, figures, or code-like formatted blocks within the main text or appendices.
Open Source Code Yes Code to reproduce experiments is given in https://github.com/svsamsonov/markov_lsa_ normal_approximation. Our experiments were conducted on a single Intel Xeon Gold 6248R CPU (48 cores, 3.0 4.0 GHz), 768 GB RAM, and 240 GB SSD storage, without GPU accelerators.
Open Datasets No We consider the simple instance of the Garnet problem [4, 32]. This problem is characterized by the number of states Ns, number of actions a, and branching factor b. We generate random vector u form unitary sphere, and compute coverage probabilities for u θ for confidence levels {0.8, 0.9, 0.95}. The detailed setting of the experiments follows [29].
Dataset Splits No The paper describes generating synthetic data for the Garnet problem based on specified hyperparameters. It does not mention using a pre-existing dataset or providing any training/test/validation splits for data partitioning.
Hardware Specification Yes Our experiments were conducted on a single Intel Xeon Gold 6248R CPU (48 cores, 3.0 4.0 GHz), 768 GB RAM, and 240 GB SSD storage, without GPU accelerators.
Software Dependencies No The paper provides a GitHub link for the code but does not explicitly list specific software dependencies (e.g., Python, PyTorch, libraries) with version numbers within the text.
Experiment Setup Yes We set the hyperparameter values Ns = 6, a = 2, b = 3, feature dimension d = 2, and discount factor λ = 0.8. We aim to evaluate the value function of policy π( |s), which is given, for any a A = {1, 2}, by the expression π(a|s) = U (s) a P|A| i=1 U (s) i , where the U (s) i are i.i.d. observations with U[0, 1]. In this case we can suppose that R(s) = Ea π(a|s)r(s, a) is a random variable which depends on state s and some independent random variable. We consider the problem of policy evaluation in this MDP using the TD learning algorithm with randomly generated feature mapping, that is, we generate the matrix with i.i.d. N(0, 1) entries, and then take ϕ(s), s {1, . . . , |S|}, to its s-th row, normalized by its euclidean norm: ϕ(s) = Φs/ Φs . We run the procedure (1) with the learning rates αk = c0/(k0 + k)γ with γ = 2/3 with appropriately chosen c0 and k0. We generate random vector u form unitary sphere, and compute coverage probabilities for u θ for confidence levels {0.8, 0.9, 0.95}. The detailed setting of the experiments follows [29].