Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Mitigating Source Bias for Fairer Weak Supervision
Authors: Changho Shin, Sonia Cromp, Dyah Adila, Frederic Sala
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%. A simple extension of our method aimed at maximizing performance produces state-of-the-art performance in five out of ten datasets in the WRENCH benchmark. |
| Researcher Affiliation | Academia | Department of Computer Sciences University of Wisconsin-Madison EMAIL |
| Pseudocode | Yes | Algorithm 1: SOURCE BIAS MITIGATION (SBM) |
| Open Source Code | Yes | Our code is available at https://github.com/SprocketLab/fairws. |
| Open Datasets | Yes | The Adult dataset [K+96] has information about the annual income of people and their demographics. |
| Dataset Splits | No | The training data has 32,561 examples, and the test data has 16,281 examples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | In this experiment, we used Snorkel [BRL+19] as the label model in weak supervision settings. |
| Experiment Setup | No | For the weak supervision pipeline, we followed a standard procedure. First, we generate weak labels from labeling functions in the training set. Secondly, we train the label model on weak labels. In this experiment, we used Snorkel [BRL+19] as the label model in weak supervision settings. Afterwards, we generate pseudolabels from the label model, train the end model on these, and evaluate it on the test set. We used logistic regression as the end model. |