Learning the Structure of Generative Models without Labeled Data

Authors: Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments to confirm these predictions. We also compare against the alternative approach of considering all possible dependencies during parameter learning. We find that our method is 100 faster. In addition, our method returns 1/4 as many extraneous correlations on synthetic data when tuned for comparable recall. Finally, we demonstrate that on real-world applications of weak supervision, using generative models with automatically learned dependencies improves performance. We find that our method provides on average 1.5 F1 points of improvement over existing, user-developed information extraction applications on Pub Med abstracts and hardware specification sheets.
Researcher Affiliation Academia 1Stanford University, Stanford, California.
Pseudocode Yes Algorithm 1 Structure Learning for Data Programming
Open Source Code Yes We implement our method as part of the open source framework Snorkel1 and evaluate it in three ways. ^1snorkel.stanford.edu
Open Datasets Yes In the first two applications, we consider a training set of 500 unlabeled abstracts from Pub Med, and in the third case 100 PDF parts sheets consisting of mixed text and tabular data.
Dataset Splits No The paper mentions training and test sets but does not explicitly describe a validation set or split.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions "Snorkel" but does not provide specific version numbers for any software components or libraries.
Experiment Setup Yes For the other parameters, we use the same values in all of our experiments: step size = m 1, epoch count T = 10, and truncation frequency K = 10.