Robust Data Programming with Precision-guided Labeling Functions

Authors: Oishik Chatterjee, Ganesh Ramakrishnan, Sunita Sarawagi3397-3404

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present extensive experiments on five datasets, comparing various models for performance and stability, and present the significantly positive impact of CAGE .
Researcher Affiliation Academia Oishik Chatterjee Department of CSE IIT Bombay, India oishik@cse.iitb.ac.in Ganesh Ramakrishnan Department of CSE IIT Bombay, India ganesh@cse.iitb.ac.in Sunita Sarawagi Department of CSE IIT Bombay, India sunita@iitb.ac.in
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code available at https://github.com/oishik75/CAGE.
Open Datasets Yes SMS spam (sms ) is a binary spam/no-spam classification dataset... Sms spam collection data set. http://www.dt.fee.unicamp.br/ tiago/smsspamcollection/.; CDR: (Ratner et al. 2016). ... 2018. Chemical Disease Relation Extraction Task. https:// github.com/Hazy Research/snorkel/tree/master/tutorials/cdr.; Dedup: 2Publicly available at https://www.cse.iitb.ac.in/ sunita/alias/; Iris: Iris is a UCI dataset...; Ionosphere: This is another 2-class UCI dataset...
Dataset Splits Yes Our train-dev-test splits and set of discrete LFs shown in Table 1 are the same as in (Ratner et al. 2016) where it was first used.; SMS spam (sms ) is a binary spam/no-spam classification dataset with 5574 documents split into 3700 unlabeled-train and 1872 labeled-test instances.; Iris: We split it into 105 unlabeled train and 45 labeled test examples.; Ionosphere: This is another 2-class UCI dataset, that is split to 245 unlabeled train and 106 labeled test instances.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper states 'We implemented our model in Pytorch' but does not specify a version number for Pytorch or any other software dependencies with versions.
Experiment Setup Yes For each dataset and discrete LF we arbitrarily assigned a default discrete quality guide qt j = 0.9 and for continuous LFs qc j = 0.85. We used learning rate of 0.01 and 100 training epochs. Parameters were initialized favorably so for the agreeing parameter initial θjkj=1 and for disagreeing parameter initial θky = 1, y = kj.