Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods
Authors: Daniel Fu, Mayee Chen, Frederic Sala, Sarah Hooper, Kayvon Fatahalian, Christopher Re
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we validate FLYINGSQUID on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University 2Department of Electrical Engineering, Stanford University. |
| Pseudocode | Yes | Algorithm 1 Triplet Method (before averaging) and Algorithm 2 Label Model Parameter Recovery are presented as distinct pseudocode blocks. |
| Open Source Code | Yes | We release FLYINGSQUID as a novel layer integrated into Py Torch.1 This layer allows weak supervision to be integrated off-the-shelf into any deep learning model, learning the accuracies of noisy labeling sources in the same training loop as the end model. 1https://github.com/Hazy Research/flyingsquid |
| Open Datasets | Yes | We evaluate FLYINGSQUID on three benchmark datasets and four video analysis tasks. Each dataset consists of a large (187–64,130) unlabeled training set, a smaller (50–9,479) hand-labeled development set, and a held-out test set. We draw three benchmark weak supervision datasets from a previous evaluation of a state-of-the-art weak supervision framework (Ratner et al., 2018). Spouse seeks to identify mentions of spouse relationships in a set of news articles (Corney et al., 2016), Spam classifies whether You Tube comments are spam (Alberto et al., 2015), and Weather is a weather sentiment task from Crowdflower (Cro, 2018). |
| Dataset Splits | Yes | Each dataset consists of a large (187–64,130) unlabeled training set, a smaller (50–9,479) hand-labeled development set, and a held-out test set. We use the unlabeled training set to train the label model and end model, and use the labeled development set for a) training a traditional supervision baseline, and b) for hyperparameter tuning of the label and end models. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments, only discussing speedups. |
| Software Dependencies | No | The paper mentions 'PyTorch' but does not provide specific version numbers for it or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | No | The paper mentions 'More details about each task and the experiments in Appendix E.', but Appendix E is not provided in the given text. There are no explicit hyperparameters (e.g., learning rate, batch size) or specific training configurations detailed in the main text. |