Learning the Structure of Generative Models without Labeled Data
Authors: Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run experiments to confirm these predictions. We also compare against the alternative approach of considering all possible dependencies during parameter learning. We find that our method is 100 faster. In addition, our method returns 1/4 as many extraneous correlations on synthetic data when tuned for comparable recall. Finally, we demonstrate that on real-world applications of weak supervision, using generative models with automatically learned dependencies improves performance. We find that our method provides on average 1.5 F1 points of improvement over existing, user-developed information extraction applications on Pub Med abstracts and hardware specification sheets. |
| Researcher Affiliation | Academia | 1Stanford University, Stanford, California. |
| Pseudocode | Yes | Algorithm 1 Structure Learning for Data Programming |
| Open Source Code | Yes | We implement our method as part of the open source framework Snorkel1 and evaluate it in three ways. ^1snorkel.stanford.edu |
| Open Datasets | Yes | In the first two applications, we consider a training set of 500 unlabeled abstracts from Pub Med, and in the third case 100 PDF parts sheets consisting of mixed text and tabular data. |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly describe a validation set or split. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Snorkel" but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | For the other parameters, we use the same values in all of our experiments: step size = m 1, epoch count T = 10, and truncation frequency K = 10. |