Active, anytime-valid risk controlling prediction sets
Authors: Ziyu Xu, Nikos Karampatziakis, Paul Mineiro
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we present practical ways of formulating label policies and empirically show that our label policies use fewer labels to reach higher utility than naive baseline labeling strategies on both simulations and real data. ... In Section 4 we also show that machine learning model based estimators of the optimal policy and predictors are label efficient in practice through experiments. ... We run all our experiments on a 48-core CPU on the Azure platform, after using a GPU to precompute the predictions made by neural network models. We set θ = 0.1, α = 0.05, and B = 0.3 for all our experiments. |
| Researcher Affiliation | Collaboration | Ziyu Xu Department of Statistics and Data Science Carnegie Mellon University xzy@cmu.edu Nikos Karampatziakis Microsoft nikosk@microsoft.com Paul Mineiro Microsoft pmineiro@microsoft.com |
| Pseudocode | No | The paper describes algorithms and formulations in text and mathematical equations but does not include a clearly labeled pseudocode block or algorithm figure. |
| Open Source Code | Yes | Code at github.com/neilzxu/active-rcps |
| Open Datasets | Yes | We also evaluate our methods on the Imagenet dataset [9], and we used the pretrained neural network classifiers from Bates et al. [4] to provide estimates of the class probabilities. ... J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009. |
| Dataset Splits | No | The paper discusses training and testing, but does not explicitly detail the exact percentages or counts for training, validation, and test splits for the datasets used. It mentions 'reshuffle our dataset for each trial' but no specific split ratios. |
| Hardware Specification | Yes | We run all our experiments on a 48-core CPU on the Azure platform, after using a GPU to precompute the predictions made by neural network models. |
| Software Dependencies | No | The paper mentions 'We use Py Torch to model our (qt) and (brt)' but does not specify the version number for PyTorch or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We set θ = 0.1, α = 0.05, and B = 0.3 for all our experiments. |