Robust Data Programming with Precision-guided Labeling Functions
Authors: Oishik Chatterjee, Ganesh Ramakrishnan, Sunita Sarawagi3397-3404
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present extensive experiments on five datasets, comparing various models for performance and stability, and present the significantly positive impact of CAGE . |
| Researcher Affiliation | Academia | Oishik Chatterjee Department of CSE IIT Bombay, India oishik@cse.iitb.ac.in Ganesh Ramakrishnan Department of CSE IIT Bombay, India ganesh@cse.iitb.ac.in Sunita Sarawagi Department of CSE IIT Bombay, India sunita@iitb.ac.in |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/oishik75/CAGE. |
| Open Datasets | Yes | SMS spam (sms ) is a binary spam/no-spam classification dataset... Sms spam collection data set. http://www.dt.fee.unicamp.br/ tiago/smsspamcollection/.; CDR: (Ratner et al. 2016). ... 2018. Chemical Disease Relation Extraction Task. https:// github.com/Hazy Research/snorkel/tree/master/tutorials/cdr.; Dedup: 2Publicly available at https://www.cse.iitb.ac.in/ sunita/alias/; Iris: Iris is a UCI dataset...; Ionosphere: This is another 2-class UCI dataset... |
| Dataset Splits | Yes | Our train-dev-test splits and set of discrete LFs shown in Table 1 are the same as in (Ratner et al. 2016) where it was first used.; SMS spam (sms ) is a binary spam/no-spam classification dataset with 5574 documents split into 3700 unlabeled-train and 1872 labeled-test instances.; Iris: We split it into 105 unlabeled train and 45 labeled test examples.; Ionosphere: This is another 2-class UCI dataset, that is split to 245 unlabeled train and 106 labeled test instances. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper states 'We implemented our model in Pytorch' but does not specify a version number for Pytorch or any other software dependencies with versions. |
| Experiment Setup | Yes | For each dataset and discrete LF we arbitrarily assigned a default discrete quality guide qt j = 0.9 and for continuous LFs qc j = 0.85. We used learning rate of 0.01 and 100 training epochs. Parameters were initialized favorably so for the agreeing parameter initial θjkj=1 and for disagreeing parameter initial θky = 1, y = kj. |