Understanding Programmatic Weak Supervision via Source-aware Influence Function

Authors: Jieyu Zhang, Haonan Wang, Cheng-Yu Hsieh, Alexander J. Ratner

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we exploit the source-aware IF in a diversity of 13 classification datasets including tabular, text, and image data, and present multiple applications. First, we show that source-aware IF can help to understand the behavior of the end model in terms of the PWS components; in particular, when interpreting the same incorrect prediction made by two PWS pipelines with different label models, data-level IF might lead to the same influential training data, whereas source-aware IF is capable of revealing the most responsible LF or source vote. Second, we use source-aware IF as a tool to identify mislabeling of each LF and show that it outperforms baselines with a significant margin of 9%-37%. Finally, we demonstrate that the training loss decomposition and source-aware IF enable fine-grained training loss perturbation and consequently lead to better test loss improvement (13%-24% improvement over ordinary IF [22] and group IF [23]).
Researcher Affiliation Collaboration Jieyu Zhang1 , Haonan Wang2 , Cheng-Yu Hsieh1, Alexander Ratner1,3 1University of Washington 2University of Illinois Urbana-Champaign 3Snorkel AI, Inc.
Pseudocode No The paper describes its methods and formulations in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The implementation is avaiable at the WRENCH benchmark:https://github.com/Jieyu Z2/ wrench
Open Datasets Yes We include the following four classification datasets in WRENCH [50], a collection of benchmarking datasets for PWS: Census, Youtube, Yelp, and IMDb. Note that the Census dataset is a tabular dataset while the others are textual datasets. We use the labeling functions (LFs) released by WRENCH [50]. We also include the following tabular datasets: Mushroom [12], Spambase [12], and Phishing Websites (PW) [29], for which we follow the instruction in the WRENCH [50] codebase3 to generate LFs from a decision tree learned on the labeled data. Finally, we follow [28] to derive LFs for a multiclass image classification task using the Domain Net [30] dataset
Dataset Splits Yes We train the end model initialized as zero by gradient descent for 10,000 epochs with learning rate being 0.001, and then calculate the influence score on validation set Dv throughout the experiments. When using the test loss as evaluation metric, we use Dv to do model selection and report the test loss of the select model.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments within the provided text.
Software Dependencies No The paper mentions software components like BERT, ResNet-18, and logistic regression, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes For label model, we focus on three commonly-used choices: Majority Voting (MV), Dawid-Skene model (DS) [9], and Snorkel [31]. The σ( ) function of the latter two label models is the exponential function and we use their approximated variants (as mentioned in Section 3.4) for all experiments. Following [23], we use logistic regression for the end model and defer the experiments on using neural networks to the appendix. We train the end model initialized as zero by gradient descent for 10,000 epochs with learning rate being 0.001, and then calculate the influence score on validation set Dv throughout the experiments.