Characterizing the Impacts of Semi-supervised Learning for Weak Supervision
Authors: Jeffrey Li, Jieyu Zhang, Ludwig Schmidt, Alexander J. Ratner
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Specifically, we first organize the intersection between SSL and WS by proposing an explicit design space... We test these methods on several standard WS benchmarks, finding that our design space is sufficient for matching the performance of more complex state-of-the-art methods. We then compare methods within our design space to ablate the importance of each axis and so provide guidance on when each is worth using (and thus more carefully exploring). |
| Researcher Affiliation | Collaboration | Jeffrey Li1, Jieyu Zhang1, Ludwig Schmidt1, Alexander Ratner1,2 1University of Washington, 2Snorkel AI {jwl2162, jieyuz2, schmidt, ajratner}@cs.washington.edu |
| Pseudocode | No | The paper describes methods and a design space but does not include any explicitly labeled 'Pseudocode' blocks or 'Algorithm' listings. |
| Open Source Code | Yes | For further details, please see our codebase at https://github.com/jeffreywpli/SSL4WS. |
| Open Datasets | Yes | We use 8 classification datasets (see Table 2)... We also create two new WS text classification tasks based on publicly avaialble datasets, Massive18 [8] and Banking77 [5]. |
| Dataset Splits | Yes | We tune all methods on a shared hyperparameter budget of 50 trials for full RoBERTa fine-tuning and 300 trials for MLPs. All reported test performances are then averages over over three additional test runs, while all error bars are the standard deviations over these runs. Finally, though our design space is compatible with any LM, we use the soft-labels produced by the Snorkel LM from [24]. However, we test robustness to this choice by also trying Majority Voting when comparing with existing methods. From the conclusions of [33], these were most consistently the best LMs across many benchmark tasks. |
| Hardware Specification | Yes | We ran all MLP experiments on AWS, using up to four g4dn.4xlarge EC2 instances at one time... For all RoBERTa training runs, we ran our experiments on a fleet of up to 15 NVIDIA-A40 GPUs hosted on the University of Washington Hyak cluster. |
| Software Dependencies | No | Table 5 lists various methods and hyperparameters (e.g., Snorkel LM, BERT, VAT, UDA), but it does not specify version numbers for general software dependencies like Python, PyTorch, or specific libraries used for implementation beyond model names. |
| Experiment Setup | Yes | Here we provide high-level details on the various components of our design space. For further details, please see our codebase at https://github.com/jeffreywpli/SSL4WS... We tune over the search space in Table 5 once per dataset... Following WRENCH, we perform early stopping based on validation performance for all methods. Specifically, we use a patience of 1000 steps for MLPs and 100 steps for RoBERTa. |