reproducibilityindex.ai

Learning the Structure of Generative Models without Labeled Data

Authors: Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run experiments to conﬁrm these predictions. We also compare against the alternative approach of considering all possible dependencies during parameter learning. We ﬁnd that our method is 100 faster. In addition, our method returns 1/4 as many extraneous correlations on synthetic data when tuned for comparable recall. Finally, we demonstrate that on real-world applications of weak supervision, using generative models with automatically learned dependencies improves performance. We ﬁnd that our method provides on average 1.5 F1 points of improvement over existing, user-developed information extraction applications on Pub Med abstracts and hardware speciﬁcation sheets.
Researcher Affiliation	Academia	1Stanford University, Stanford, California.
Pseudocode	Yes	Algorithm 1 Structure Learning for Data Programming
Open Source Code	Yes	We implement our method as part of the open source framework Snorkel1 and evaluate it in three ways. ^1snorkel.stanford.edu
Open Datasets	Yes	In the ﬁrst two applications, we consider a training set of 500 unlabeled abstracts from Pub Med, and in the third case 100 PDF parts sheets consisting of mixed text and tabular data.
Dataset Splits	No	The paper mentions training and test sets but does not explicitly describe a validation set or split.
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions "Snorkel" but does not provide specific version numbers for any software components or libraries.
Experiment Setup	Yes	For the other parameters, we use the same values in all of our experiments: step size = m 1, epoch count T = 10, and truncation frequency K = 10.