Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Authors: Daniel Fu, Mayee Chen, Frederic Sala, Sarah Hooper, Kayvon Fatahalian, Christopher Re

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we validate FLYINGSQUID on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.
Researcher Affiliation Academia 1Department of Computer Science, Stanford University 2Department of Electrical Engineering, Stanford University.
Pseudocode Yes Algorithm 1 Triplet Method (before averaging) and Algorithm 2 Label Model Parameter Recovery are presented as distinct pseudocode blocks.
Open Source Code Yes We release FLYINGSQUID as a novel layer integrated into Py Torch.1 This layer allows weak supervision to be integrated off-the-shelf into any deep learning model, learning the accuracies of noisy labeling sources in the same training loop as the end model. 1https://github.com/Hazy Research/flyingsquid
Open Datasets Yes We evaluate FLYINGSQUID on three benchmark datasets and four video analysis tasks. Each dataset consists of a large (187–64,130) unlabeled training set, a smaller (50–9,479) hand-labeled development set, and a held-out test set. We draw three benchmark weak supervision datasets from a previous evaluation of a state-of-the-art weak supervision framework (Ratner et al., 2018). Spouse seeks to identify mentions of spouse relationships in a set of news articles (Corney et al., 2016), Spam classifies whether You Tube comments are spam (Alberto et al., 2015), and Weather is a weather sentiment task from Crowdflower (Cro, 2018).
Dataset Splits Yes Each dataset consists of a large (187–64,130) unlabeled training set, a smaller (50–9,479) hand-labeled development set, and a held-out test set. We use the unlabeled training set to train the label model and end model, and use the labeled development set for a) training a traditional supervision baseline, and b) for hyperparameter tuning of the label and end models.
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments, only discussing speedups.
Software Dependencies No The paper mentions 'PyTorch' but does not provide specific version numbers for it or any other software libraries or dependencies used in the experiments.
Experiment Setup No The paper mentions 'More details about each task and the experiments in Appendix E.', but Appendix E is not provided in the given text. There are no explicit hyperparameters (e.g., learning rate, batch size) or specific training configurations detailed in the main text.