Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Active Learning with Safety Constraints

Authors: Romain Camilleri, Andrew Wagenmaker, Jamie H. Morgenstern, Lalit Jain, Kevin G. Jamieson

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In practice, we demonstrate that this approach performs well on synthetic and real world datasets.
Researcher Affiliation	Academia	University of Washington, Seattle, WA EMAIL,EMAIL
Pseudocode	Yes	Algorithm 1 Best Safe Arm Identiﬁcation (BESIDE) on page 4; Algorithm 2 Active constrained classification with randomized exploration on page 7.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Refer to Appendix.
Open Datasets	Yes	We evaluate on the adult income data set [27] (48,842 examples)... [27] M Lichman. Uci machine learning repository 2013. URL http://archive.ics.uci.edu/. We consider the German Credit Dataset originally from the Staﬂog Project Databases [24]... [24] E. Keogh, C.; Blake, and C. J. Merz. Uci repository of machine learning databases 1998. URL http://archive.ics.uci.edu/ml.
Dataset Splits	No	The paper describes a pool-based active learning setup where labels are acquired dynamically, rather than specifying fixed training, validation, and test splits with percentages or sample counts for the overall dataset.
Hardware Specification	No	The paper states 'See Appendix' for compute resources, but the Appendix does not provide specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For the Adult dataset, we randomly sample 2000 points from the dataset... batch size is set to 25 and initial number of queried labels is 50. For the German Credit dataset, we use the entire dataset (1000 points)... batch size is set to 25 and initial number of queried labels is 50. In the active classification experiments we set the number of rounds L = 100, the number of classiﬁers per round k = 10 and the perturbation variance σ = 0.05.