Characterizing the Impacts of Semi-supervised Learning for Weak Supervision

Authors: Jeffrey Li, Jieyu Zhang, Ludwig Schmidt, Alexander J. Ratner

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Specifically, we first organize the intersection between SSL and WS by proposing an explicit design space... We test these methods on several standard WS benchmarks, finding that our design space is sufficient for matching the performance of more complex state-of-the-art methods. We then compare methods within our design space to ablate the importance of each axis and so provide guidance on when each is worth using (and thus more carefully exploring).
Researcher Affiliation Collaboration Jeffrey Li1, Jieyu Zhang1, Ludwig Schmidt1, Alexander Ratner1,2 1University of Washington, 2Snorkel AI {jwl2162, jieyuz2, schmidt, ajratner}@cs.washington.edu
Pseudocode No The paper describes methods and a design space but does not include any explicitly labeled 'Pseudocode' blocks or 'Algorithm' listings.
Open Source Code Yes For further details, please see our codebase at https://github.com/jeffreywpli/SSL4WS.
Open Datasets Yes We use 8 classification datasets (see Table 2)... We also create two new WS text classification tasks based on publicly avaialble datasets, Massive18 [8] and Banking77 [5].
Dataset Splits Yes We tune all methods on a shared hyperparameter budget of 50 trials for full RoBERTa fine-tuning and 300 trials for MLPs. All reported test performances are then averages over over three additional test runs, while all error bars are the standard deviations over these runs. Finally, though our design space is compatible with any LM, we use the soft-labels produced by the Snorkel LM from [24]. However, we test robustness to this choice by also trying Majority Voting when comparing with existing methods. From the conclusions of [33], these were most consistently the best LMs across many benchmark tasks.
Hardware Specification Yes We ran all MLP experiments on AWS, using up to four g4dn.4xlarge EC2 instances at one time... For all RoBERTa training runs, we ran our experiments on a fleet of up to 15 NVIDIA-A40 GPUs hosted on the University of Washington Hyak cluster.
Software Dependencies No Table 5 lists various methods and hyperparameters (e.g., Snorkel LM, BERT, VAT, UDA), but it does not specify version numbers for general software dependencies like Python, PyTorch, or specific libraries used for implementation beyond model names.
Experiment Setup Yes Here we provide high-level details on the various components of our design space. For further details, please see our codebase at https://github.com/jeffreywpli/SSL4WS... We tune over the search space in Table 5 once per dataset... Following WRENCH, we perform early stopping based on validation performance for all methods. Specifically, we use a patience of 1000 steps for MLPs and 100 steps for RoBERTa.