reproducibilityindex.ai

Data Programming: Creating Large Training Sets, Quickly

Authors: Alexander J. Ratner, Christopher M. De Sa, Sen Wu, Daniel Selsam, Christopher Ré

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, on the 2014 TAC-KBP Slot Filling challenge, we show that data programming would have led to a new winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a state-of-the-art LSTM baseline (and into second place in the competition).
Researcher Affiliation	Academia	Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré Stanford University {ajratner,cdesa,senwu,dselsam,chrismre}@stanford.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	To test this, we arranged a hackathon involving a handful of bioinformatics researchers, using our open-source information extraction framework Snorkel4 (formerly DDLite). 4snorkel.stanford.edu
Open Datasets	Yes	We examine a news application from the 2014 TAC-KBP Slot Filling challenge2, where we extract relations between real-world entities from articles [2]; a clinical genomics application, where we extract causal relations between genetic mutations and phenotypes from the scientiﬁc literature3; and a pharmacogenomics application where we extract interactions between genes, also from the scientiﬁc literature [21]; further details are included in the Appendix. 2http://www.nist.gov/tac/2014/KBP/ 3https://github.com/HazyResearch/dd-genomics
Dataset Splits	No	The paper states 'For all experiments, we evaluated on a blind hand-labeled evaluation set' but does not provide specific details on training/validation splits, proportions, or methods like cross-validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions software like 'LSTM' and 'Snorkel (formerly DDLite)' but does not specify version numbers for any software dependencies, libraries, or frameworks used.
Experiment Setup	No	The paper describes the general approach and feature generation methods but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings.