Learning from Rules Generalizing Labeled Exemplars
Authors: Abhijeet Awasthi, Sabyasachi Ghosh, Rasna Goyal, Sunita Sarawagi
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on five different tasks shows that (1) our algorithm is more accurate than several existing methods of learning from a mix of clean and noisy supervision, and (2) the coupled rule-exemplar supervision is effective in denoising rules. |
| Researcher Affiliation | Academia | Abhijeet Awasthi Sabyasachi Ghosh Rasna Goyal Sunita Sarawagi Department of Computer Science and Engineering Indian Instiute of Technology Bombay Mumbai, Maharashtra 400076, India |
| Pseudocode | Yes | A pseudocode of our overall training algorithm is described in Algorithm 1. Algorithm 1 Our Joint Training Algorithm using Posterior Regularization |
| Open Source Code | Yes | Code and datasets available at https://github.com/awasthiabhijeet/Learning-From-Rules |
| Open Datasets | Yes | Question Classification (Li & Roth, 2002): This is a TREC-6 dataset... MIT-R1 (Liu et al., 2013): This is a slot-filling task... SMS Spam Classification (Almeida et al., 2011): This dataset contains 5.5k text messages... Youtube Spam Classification (Alberto et al., 2015): Here the task is to classify comments on You Tube videos as Spam or Not-Spam. We obtain this from Snorkel s Github page2... Census Income (Dua & Graff, 2019): This UCI dataset is extracted from the 1994 U.S. census. It lists a total of 13 features of an individual... |
| Dataset Splits | Yes | Table 1: Statistics of datasets and their rules. |Valid| |Test|... Question 68 4884 68 95 63.8 22.5 124 1.8 500 500... The training set has 5452 instances which are split as 68 for L, 500 for validation, and the remaining as U. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only discusses the type of networks and embeddings used. |
| Software Dependencies | No | As the embedding layer we use a pretrained ELMO (Peters et al., 2018) network... We use Adam optimizer... The input is passed through multiple non-linear layers with Re LU activation before passing through a Sigmoid activation which outputs the probability Pjφ(rj = 1|x)... For You Tube, the classifier network is a simple logistic regression like in Snorkel s code. The paper mentions software components but does not provide specific version numbers for reproducibility. |
| Experiment Setup | Yes | Each reported number is obtained by averaging over ten random initializations. Whenever a method involved hyper-parameters to weigh the relative contribution of various terms in the objective, we used a validation dataset to tune the value of the hyper-parameter. Hyperparameters used are provided in Section C of supplementary. Table 8: Hyperparameters for various methods and datasets. bs refers to the batch size and lr refers to the learning rate. For Only-L baseline smaller batch size was used considering the smaller size of L set. |