Training Subset Selection for Weak Supervision
Authors: Hunter Lang, Aravindan Vijayaraghavan, David Sontag
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present numerical experiments demonstrating that the status quo of using all the pseudolabeled data is nearly always suboptimal. Combining good pretrained representations with the cut statistic [23] for subset selection, we obtain subsets of the weakly-labeled training data where the weak labels are very accurate. ... Our empirical study shows that this combination is very effective at selecting good pseudolabeled training data across a wide variety of label models, end models, and datasets. We evaluate our approach on the WRENCH benchmark [42] for weak supervision. We compare the status-quo of full coverage (β 1.0) to β chosen from t0.1, 0.2, . . . , 1.0u. We evaluate our approach with five different label models: Majority Vote (MV), the original Snorkel/Data Programming (DP), [30], Dawid-Skene (DS) [8], Flying Squid (FS) [10], and Me Ta L [29]. |
| Researcher Affiliation | Academia | Hunter Lang MIT CSAIL hjl@mit.edu Aravindan Vijayaraghavan Northwestern University aravindv@northwestern.edu David Sontag MIT CSAIL dsontag@mit.edu |
| Pseudocode | No | The paper mentions providing code in Appendix C but does not include any pseudocode or a formally labeled algorithm block in the main text. |
| Open Source Code | Yes | We include the code for reproducing our empirical results in the supplementary material. |
| Open Datasets | Yes | We evaluate our approach on the WRENCH benchmark [42] for weak supervision. ... Full details for the datasets and the weak label sources are available in [42] Table 5 and reproduced here in Appendix B.1. |
| Dataset Splits | Yes | Hyperparameter tuning. Our subset selection approach introduces a new hyperparameter, β the fraction of covered data to retain for training the classifier. ... choosing the value with the best (ground-truth) validation performance. ... The average validation set size of the WRENCH datasets from Table 1 is over 2,500 examples. ... We compare choosing the best model checkpoint and picking the best coverage fraction β using (i) the full validation set and (ii) a randomly-sampled validation set of 100 examples. |
| Hardware Specification | Yes | We performed all model training on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'pretrained roberta-base and bert-base-cased' models and downloading weights from 'huggingface.co/datasets', but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | To keep the hyperparameter tuning burden low, we first tune all other hyperparameters identically to Zhang et al. [42] holding β fixed at 1.0. We then use the optimal hyperparameters (learning rate, batch size, weight decay, etc.) from β 1.0 for a grid search over values of β P t0.1, 0.2, . . . , 1.0u... In all of our experiments, we used K 20 nearest neighbors to compute the cut statistic and performed no tuning on this value. |