reproducibilityindex.ai

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Authors: Daniel Fu, Mayee Chen, Frederic Sala, Sarah Hooper, Kayvon Fatahalian, Christopher Re

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we validate FLYINGSQUID on benchmark weak supervision datasets and ﬁnd that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.
Researcher Affiliation	Academia	1Department of Computer Science, Stanford University 2Department of Electrical Engineering, Stanford University.
Pseudocode	Yes	Algorithm 1 Triplet Method (before averaging) and Algorithm 2 Label Model Parameter Recovery are presented as distinct pseudocode blocks.
Open Source Code	Yes	We release FLYINGSQUID as a novel layer integrated into Py Torch.1 This layer allows weak supervision to be integrated off-the-shelf into any deep learning model, learning the accuracies of noisy labeling sources in the same training loop as the end model. 1https://github.com/Hazy Research/ﬂyingsquid
Open Datasets	Yes	We evaluate FLYINGSQUID on three benchmark datasets and four video analysis tasks. Each dataset consists of a large (187–64,130) unlabeled training set, a smaller (50–9,479) hand-labeled development set, and a held-out test set. We draw three benchmark weak supervision datasets from a previous evaluation of a state-of-the-art weak supervision framework (Ratner et al., 2018). Spouse seeks to identify mentions of spouse relationships in a set of news articles (Corney et al., 2016), Spam classiﬁes whether You Tube comments are spam (Alberto et al., 2015), and Weather is a weather sentiment task from Crowdﬂower (Cro, 2018).
Dataset Splits	Yes	Each dataset consists of a large (187–64,130) unlabeled training set, a smaller (50–9,479) hand-labeled development set, and a held-out test set. We use the unlabeled training set to train the label model and end model, and use the labeled development set for a) training a traditional supervision baseline, and b) for hyperparameter tuning of the label and end models.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments, only discussing speedups.
Software Dependencies	No	The paper mentions 'PyTorch' but does not provide specific version numbers for it or any other software libraries or dependencies used in the experiments.
Experiment Setup	No	The paper mentions 'More details about each task and the experiments in Appendix E.', but Appendix E is not provided in the given text. There are no explicit hyperparameters (e.g., learning rate, batch size) or specific training configurations detailed in the main text.