reproducibilityindex.ai

Robust Data Programming with Precision-guided Labeling Functions

Authors: Oishik Chatterjee, Ganesh Ramakrishnan, Sunita Sarawagi3397-3404

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present extensive experiments on ﬁve datasets, comparing various models for performance and stability, and present the signiﬁcantly positive impact of CAGE .
Researcher Affiliation	Academia	Oishik Chatterjee Department of CSE IIT Bombay, India oishik@cse.iitb.ac.in Ganesh Ramakrishnan Department of CSE IIT Bombay, India ganesh@cse.iitb.ac.in Sunita Sarawagi Department of CSE IIT Bombay, India sunita@iitb.ac.in
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at https://github.com/oishik75/CAGE.
Open Datasets	Yes	SMS spam (sms ) is a binary spam/no-spam classiﬁcation dataset... Sms spam collection data set. http://www.dt.fee.unicamp.br/ tiago/smsspamcollection/.; CDR: (Ratner et al. 2016). ... 2018. Chemical Disease Relation Extraction Task. https:// github.com/Hazy Research/snorkel/tree/master/tutorials/cdr.; Dedup: 2Publicly available at https://www.cse.iitb.ac.in/ sunita/alias/; Iris: Iris is a UCI dataset...; Ionosphere: This is another 2-class UCI dataset...
Dataset Splits	Yes	Our train-dev-test splits and set of discrete LFs shown in Table 1 are the same as in (Ratner et al. 2016) where it was ﬁrst used.; SMS spam (sms ) is a binary spam/no-spam classiﬁcation dataset with 5574 documents split into 3700 unlabeled-train and 1872 labeled-test instances.; Iris: We split it into 105 unlabeled train and 45 labeled test examples.; Ionosphere: This is another 2-class UCI dataset, that is split to 245 unlabeled train and 106 labeled test instances.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper states 'We implemented our model in Pytorch' but does not specify a version number for Pytorch or any other software dependencies with versions.
Experiment Setup	Yes	For each dataset and discrete LF we arbitrarily assigned a default discrete quality guide qt j = 0.9 and for continuous LFs qc j = 0.85. We used learning rate of 0.01 and 100 training epochs. Parameters were initialized favorably so for the agreeing parameter initial θjkj=1 and for disagreeing parameter initial θky = 1, y = kj.