Learning with Labeling Induced Abstentions

Authors: Kareem Amin, Giulia DeSalvo, Afshin Rostamizadeh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a thorough set of experiments including an ablation study to test different components of our algorithm. We demonstrate the effectiveness of an efficient version of our algorithm over margin sampling on a variety of datasets.
Researcher Affiliation Industry Kareem Amin Google Research New York, NY kamin@google.com Giulia De Salvo Google Research New York, NY giuliad@google.com Afshin Rostamizadeh Google Research New York, NY rostami@google.com
Pseudocode Yes Algorithm 1 DPL-IWAL Algorithm
Open Source Code No The paper mentions using publicly available datasets and libraries like scikit-learn and LIBSVM tools, but does not provide a link or explicit statement about the public availability of the authors' own source code for DPL-IWAL or DPL-Simplified.
Open Datasets Yes We test six publicly available datasets [Chang and Lin] and for each, we use linear logistic regression models trained using the Python scikit-learn library. Chih-Chung Chang and Chih-Jen Lin. Libsvm. https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets/. Accessed: 2021-05-28.
Dataset Splits No The paper mentions using a 'training split' and 'test split' for experiments ('a different random train/test split for each trial'), but it does not explicitly specify a separate 'validation' split or its size/proportion.
Hardware Specification No The paper describes the software libraries and datasets used for experiments but does not provide any specific details about the hardware specifications (e.g., GPU/CPU models, memory) on which these experiments were run.
Software Dependencies No The paper mentions using 'Python scikit-learn library' and 'scikit-learn s KNeighbors Classifier' and 'scikit-learn s Logistic Regression implementation', but it does not specify version numbers for these software components.
Experiment Setup Yes We execute a batch variant of DPL-Simplified, where at each iteration we process a batch of 5,000 examples, querying 20% of the examples for their labels and making prediction for the rest. All methods are seeded with 500 randomly sampled initial examples and each experiment is run for 10 trials.