Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning with Labeling Induced Abstentions

Authors: Kareem Amin, Giulia DeSalvo, Afshin Rostamizadeh

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a thorough set of experiments including an ablation study to test different components of our algorithm. We demonstrate the effectiveness of an efficient version of our algorithm over margin sampling on a variety of datasets.
Researcher Affiliation Industry Kareem Amin Google Research New York, NY EMAIL Giulia De Salvo Google Research New York, NY EMAIL Afshin Rostamizadeh Google Research New York, NY EMAIL
Pseudocode Yes Algorithm 1 DPL-IWAL Algorithm
Open Source Code No The paper mentions using publicly available datasets and libraries like scikit-learn and LIBSVM tools, but does not provide a link or explicit statement about the public availability of the authors' own source code for DPL-IWAL or DPL-Simplified.
Open Datasets Yes We test six publicly available datasets [Chang and Lin] and for each, we use linear logistic regression models trained using the Python scikit-learn library. Chih-Chung Chang and Chih-Jen Lin. Libsvm. https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets/. Accessed: 2021-05-28.
Dataset Splits No The paper mentions using a 'training split' and 'test split' for experiments ('a different random train/test split for each trial'), but it does not explicitly specify a separate 'validation' split or its size/proportion.
Hardware Specification No The paper describes the software libraries and datasets used for experiments but does not provide any specific details about the hardware specifications (e.g., GPU/CPU models, memory) on which these experiments were run.
Software Dependencies No The paper mentions using 'Python scikit-learn library' and 'scikit-learn s KNeighbors Classifier' and 'scikit-learn s Logistic Regression implementation', but it does not specify version numbers for these software components.
Experiment Setup Yes We execute a batch variant of DPL-Simplified, where at each iteration we process a batch of 5,000 examples, querying 20% of the examples for their labels and making prediction for the rest. All methods are seeded with 500 randomly sampled initial examples and each experiment is run for 10 trials.