Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation

Authors: Chen YAN, Weina Wang, Lei Ying

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the performance of our SP-based policy (Algorithm 1) on a machine maintenance problem [9, 12], an RMAB formulation motivated by real-world trade-offs in preventive maintenance and resource allocation. The resulting SP is solved using the EDDP algorithm [20, Algorithm 3]. The details of the experiments are presented in Appendix B, along with an empirical study of the computational complexity of EDDP for solving the SPs arising from RMABs across varying problem sizes. We performed two sets of experiments where the problem instances are degenerate: one set where the fluid LP solution is unique (Figure 3a and 3c) and one set where it is not unique (Figure 3b and 3d). For both sets, computing the optimal policies is intractable. In each set of experiments, we evaluated the performance of our SP-based policy and the LP-based policy on a sequence of problems with increasing numbers of machines N, and compared the total reward difference. Figure 3 demonstrates that the improvement of our SP-based policy over the LP-based policy grows with N in both settings.
Researcher Affiliation Academia 1University of Michigan, Ann Arbor 2Carnegie Mellon University EMAIL EMAIL
Pseudocode Yes Algorithm 1 Stochastic-Programming-based (SP-based) policy
Open Source Code Yes The code is included in the supplemental materials.
Open Datasets No We conducted a numerical study on how likely a randomly generated RMAB instance is degenerate (Definition 3.1) and the corresponding LP (8) satisfies the Uniqueness Assumption 4.1. In this experiment, an RMAB instance is generated as follows: Each parameter is sampled via i.i.d. exp(1) and normalized properly as in the initial condition xini and in the transition kernels Ph.
Dataset Splits No The paper describes generating RMAB instances for experiments ('1,000 RMAB instances' for EDDP evaluation, two sets of experiments based on unique/non-unique fluid LP solutions) but does not define traditional train/test/validation splits of a fixed dataset.
Hardware Specification Yes All numerical experiments in this paper were conducted on a personal laptop equipped with a 13th Gen Intel(R) i9-13980HX CPU.
Software Dependencies No We implement the solutions using the Python package Pu LP and the Gurobi LP solver.
Experiment Setup Yes Appendix B.3 Parameters and implementation details of the RMAB examples used in Section 5. The RMAB examples used in Section 5 model a machine maintenance problem. Each machine has five states, where a higher state represents a more deteriorated condition. ... In these RMAB examples, we set H = 5 and α = 0.4. We consider arms representing machines each described by a 10 states MDP constructed from two distinct machine types, each having 5 states. ... All numerical parameters provided below are recorded with 4 digits of precision. P( | , 0) = ... P( | , 1) = ... xini = ... r( , 0) = ... In the first set of experiments, we use r( , 1) = ... In the second set of experiments, we use r( , 1) = ...