Designing Decision Support Systems using Counterfactual Prediction Sets

Authors: Eleni Straitouri, Manuel Gomez Rodriguez

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a large-scale human subject study (n = 2,751) to compare our methodology to several competitive baselines.
Researcher Affiliation Academia 1Max Planck Institute for Software Systems, Kaiserslautern, Germany.
Pseudocode Yes Algorithm 1 Counterfactual Successive Elimination
Open Source Code Yes An open-source implementation of both the strict and the lenient implementation of our system as well as all the data gathered in our human subject study, which we refer to as Image Net16H-PS, are available at https://github.c om/Networks-Learning/counterfactual-p rediction-sets.
Open Datasets Yes To construct our dataset Image Net16H-PS, we gathered 194,407 label predictions from 2,751 human participants for 1,200 unique images from the Image Net16H dataset (Steyvers et al., 2022) using Prolific. Our experimental protocol received approval from the Institutional Review Board (IRB) at the University of Saarland, each participant was rewarded with 9 per hour pro-rated, following Prolific s payment principles, and consented to participate by filling a consent form that included a detailed description of the study processes, and the collected data did not include any personally identifiable information.
Dataset Splits Yes we used always the same classifier, namely the pre-trained VGG-19 (Simonyan & Zisserman, 2015) after 10 epochs of fine-tuning as provided by Steyvers et al. (2022) and a fixed calibration set of 120 images, picked at random.
Hardware Specification Yes All experiments ran on a Mac OS machine with an M1 processor and 16GB memory.
Software Dependencies Yes We implemented our algorithms on Python 3.10.9 using the following libraries: Num Py 1.24.1 (BSD-3-Clause License). Pandas 1.5.3 (BSD-3-Clause License). Scikit-learn 1.2.2 (BSD License).
Experiment Setup Yes For reproducibility, we used a fixed random seed in all random procedures (a different one) for each realization of the algorithms. Similarly, we used a fixed random seed to randomly pick the 120 images of the calibration set. In our study, we used always the same classifier, namely the pre-trained VGG-19 (Simonyan & Zisserman, 2015) after 10 epochs of fine-tuning as provided by Steyvers et al. (2022) and a fixed calibration set of 120 images, picked at random.