Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Near-Exponential Savings for Population Mean Estimation with Active Learning

Authors: Julian Morimoto, JACOB GOLDIN, Daniel Ho

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate our methods through simulation using nationwide electronic health records. Our methods can be implemented using the Parti Bandits package in R. We conduct simulation studies using real-world data from over 6 million electronic health records and find that the gains predicted by our theory for population mean estimation can be achieved even in realistic small-sample regimes (Section 5). Empirical Illustration. The left panel of Figure 1 shows how Parti Bandits performs as the strength of the relationship between X and Y varies. Figure 2: Comparison of estimation error for different label budgets using the AFC data.
Researcher Affiliation	Collaboration	Julian M. Morimoto EMAIL Department of Statistics, University of California, Berkeley; Regulation, Evaluation, and Governance Lab, Stanford Law School; World Bank Group. Jacob Goldin EMAIL University of Chicago; American Bar Foundation. Daniel E. Ho EMAIL Stanford University. The World Bank Group affiliation indicates an industry/intergovernmental entity mixed with academic institutions like University of California, Berkeley, University of Chicago, and Stanford University.
Pseudocode	Yes	Algorithm 1 Warm Start-UCB. Algorithm 2 Parti Bandits. Algorithm 3 Heterogeneity-Aware Active Learning Algorithm.
Open Source Code	Yes	Our methods can be implemented using the Parti Bandits package in R. Justification: We introduce a new R package, Parti Bandits, and provide documentation, usage examples, and licensing information alongside the code.
Open Datasets	No	To illustrate the gains of Parti Bandits in a real-world setting, we leverage access to the American Family Cohort (AFC) dataset... The data for the AFC simulations cannot be provided as it contains highly sensitive personal identifying information and would present significant ethics concerns if released.
Dataset Splits	No	To run this simulation, we draw, for each label budget, 500 random subsets of 10,000 patients each from the full AFC dataset of 6 million patients. Within each subset, we restrict attention to individuals whose geocoding-derived probabilities of being Black, X, fall in the top or bottom 5th percentile, and estimate the mean of Y for this subpopulation. This describes a sampling strategy for simulations rather than explicit training/test/validation splits for a model.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It mentions Monte Carlo simulations and health records data simulations but no specific CPU or GPU models, or other hardware details.
Software Dependencies	No	Our methods can be implemented using the Parti Bandits package in R. The paper mentions the programming language R and a package, but does not provide specific version numbers for R or any other software dependencies.
Experiment Setup	Yes	For each label budget from 80 to 140, we run 500 simulations. For each run, we compute the absolute difference between the mean estimated by the algorithm and the true mean; we then report the 90th percentile of these 500 errors. We set S = A2 for our runs of Parti Bandits. To run this simulation, we draw, for each label budget, 500 random subsets of 10,000 patients each from the full AFC dataset of 6 million patients. Our choice of S is the classical A2 algorithm of Balcan et al. (2006).