reproducibilityindex.ai

Active, anytime-valid risk controlling prediction sets

Authors: Ziyu Xu, Nikos Karampatziakis, Paul Mineiro

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we present practical ways of formulating label policies and empirically show that our label policies use fewer labels to reach higher utility than naive baseline labeling strategies on both simulations and real data. ... In Section 4 we also show that machine learning model based estimators of the optimal policy and predictors are label efficient in practice through experiments. ... We run all our experiments on a 48-core CPU on the Azure platform, after using a GPU to precompute the predictions made by neural network models. We set θ = 0.1, α = 0.05, and B = 0.3 for all our experiments.
Researcher Affiliation	Collaboration	Ziyu Xu Department of Statistics and Data Science Carnegie Mellon University xzy@cmu.edu Nikos Karampatziakis Microsoft nikosk@microsoft.com Paul Mineiro Microsoft pmineiro@microsoft.com
Pseudocode	No	The paper describes algorithms and formulations in text and mathematical equations but does not include a clearly labeled pseudocode block or algorithm figure.
Open Source Code	Yes	Code at github.com/neilzxu/active-rcps
Open Datasets	Yes	We also evaluate our methods on the Imagenet dataset [9], and we used the pretrained neural network classifiers from Bates et al. [4] to provide estimates of the class probabilities. ... J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009.
Dataset Splits	No	The paper discusses training and testing, but does not explicitly detail the exact percentages or counts for training, validation, and test splits for the datasets used. It mentions 'reshuffle our dataset for each trial' but no specific split ratios.
Hardware Specification	Yes	We run all our experiments on a 48-core CPU on the Azure platform, after using a GPU to precompute the predictions made by neural network models.
Software Dependencies	No	The paper mentions 'We use Py Torch to model our (qt) and (brt)' but does not specify the version number for PyTorch or any other software dependencies with version numbers.
Experiment Setup	Yes	We set θ = 0.1, α = 0.05, and B = 0.3 for all our experiments.