reproducibilityindex.ai

Counterfactually Comparing Abstaining Classifiers

Authors: Yo Joong Choe, Aditya Gangrade, Aaditya Ramdas

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach is examined in both simulated and real data experiments. ... We present our results in Table 2. ... To illustrate a real data use case, we compare abstaining classifiers on the CIFAR-100 image classification dataset (Krizhevsky, 2009).
Researcher Affiliation	Academia	Yo Joong Choe Data Science Institute University of Chicago yjchoe@uchicago.edu Aditya Gangrade Department of EECS University of Michigan aditg@umich.edu Aaditya Ramdas Dept. of Statistics and Data Science Machine Learning Department Carnegie Mellon University arimdas@cmu.edu
Pseudocode	No	The paper describes the methods textually and mathematically but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	All code for the experiments is publicly available online at https://github.com/yjchoe/Comparing Abstaining Classifiers.
Open Datasets	Yes	To illustrate a real data use case, we compare abstaining classifiers on the CIFAR-100 image classification dataset (Krizhevsky, 2009).
Dataset Splits	No	The paper mentions using a 'validation set' for CIFAR-100 but does not provide specific split percentages or sample counts for training, validation, or test sets in the main text.
Hardware Specification	No	The paper mentions using 'XSEDE' and the 'Bridges-2 system' at the 'Pittsburgh Supercomputing Center (PSC)', but it does not specify any particular hardware components like CPU or GPU models, or their specifications.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For the nuisance functions, we try linear predictors (L2-regularized linear/logistic regression for ˆµ0/ˆπ), random forests, and super learners with k-NN, kernel SVM, and random forests. ... use the same softmax output layer but use a different threshold for abstentions. Specifically, both classifiers use the softmax response (SR) thresholding (Geifman and El-Yaniv, 2017), i.e., abstain if maxc Y f(X)c < τ for a threshold τ > 0, but A uses a more conservative threshold (τ = 0.8) than B (τ = 0.5).