Counterfactually Comparing Abstaining Classifiers

Authors: Yo Joong Choe, Aditya Gangrade, Aaditya Ramdas

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is examined in both simulated and real data experiments. ... We present our results in Table 2. ... To illustrate a real data use case, we compare abstaining classifiers on the CIFAR-100 image classification dataset (Krizhevsky, 2009).
Researcher Affiliation Academia Yo Joong Choe Data Science Institute University of Chicago yjchoe@uchicago.edu Aditya Gangrade Department of EECS University of Michigan aditg@umich.edu Aaditya Ramdas Dept. of Statistics and Data Science Machine Learning Department Carnegie Mellon University arimdas@cmu.edu
Pseudocode No The paper describes the methods textually and mathematically but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes All code for the experiments is publicly available online at https://github.com/yjchoe/Comparing Abstaining Classifiers.
Open Datasets Yes To illustrate a real data use case, we compare abstaining classifiers on the CIFAR-100 image classification dataset (Krizhevsky, 2009).
Dataset Splits No The paper mentions using a 'validation set' for CIFAR-100 but does not provide specific split percentages or sample counts for training, validation, or test sets in the main text.
Hardware Specification No The paper mentions using 'XSEDE' and the 'Bridges-2 system' at the 'Pittsburgh Supercomputing Center (PSC)', but it does not specify any particular hardware components like CPU or GPU models, or their specifications.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For the nuisance functions, we try linear predictors (L2-regularized linear/logistic regression for ˆµ0/ˆπ), random forests, and super learners with k-NN, kernel SVM, and random forests. ... use the same softmax output layer but use a different threshold for abstentions. Specifically, both classifiers use the softmax response (SR) thresholding (Geifman and El-Yaniv, 2017), i.e., abstain if maxc Y f(X)c < τ for a threshold τ > 0, but A uses a more conservative threshold (τ = 0.8) than B (τ = 0.5).