Active, anytime-valid risk controlling prediction sets

Authors: Ziyu Xu, Nikos Karampatziakis, Paul Mineiro

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we present practical ways of formulating label policies and empirically show that our label policies use fewer labels to reach higher utility than naive baseline labeling strategies on both simulations and real data. ... In Section 4 we also show that machine learning model based estimators of the optimal policy and predictors are label efficient in practice through experiments. ... We run all our experiments on a 48-core CPU on the Azure platform, after using a GPU to precompute the predictions made by neural network models. We set θ = 0.1, α = 0.05, and B = 0.3 for all our experiments.
Researcher Affiliation Collaboration Ziyu Xu Department of Statistics and Data Science Carnegie Mellon University xzy@cmu.edu Nikos Karampatziakis Microsoft nikosk@microsoft.com Paul Mineiro Microsoft pmineiro@microsoft.com
Pseudocode No The paper describes algorithms and formulations in text and mathematical equations but does not include a clearly labeled pseudocode block or algorithm figure.
Open Source Code Yes Code at github.com/neilzxu/active-rcps
Open Datasets Yes We also evaluate our methods on the Imagenet dataset [9], and we used the pretrained neural network classifiers from Bates et al. [4] to provide estimates of the class probabilities. ... J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009.
Dataset Splits No The paper discusses training and testing, but does not explicitly detail the exact percentages or counts for training, validation, and test splits for the datasets used. It mentions 'reshuffle our dataset for each trial' but no specific split ratios.
Hardware Specification Yes We run all our experiments on a 48-core CPU on the Azure platform, after using a GPU to precompute the predictions made by neural network models.
Software Dependencies No The paper mentions 'We use Py Torch to model our (qt) and (brt)' but does not specify the version number for PyTorch or any other software dependencies with version numbers.
Experiment Setup Yes We set θ = 0.1, α = 0.05, and B = 0.3 for all our experiments.