Learning from Rules Generalizing Labeled Exemplars

Authors: Abhijeet Awasthi, Sabyasachi Ghosh, Rasna Goyal, Sunita Sarawagi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on five different tasks shows that (1) our algorithm is more accurate than several existing methods of learning from a mix of clean and noisy supervision, and (2) the coupled rule-exemplar supervision is effective in denoising rules.
Researcher Affiliation Academia Abhijeet Awasthi Sabyasachi Ghosh Rasna Goyal Sunita Sarawagi Department of Computer Science and Engineering Indian Instiute of Technology Bombay Mumbai, Maharashtra 400076, India
Pseudocode Yes A pseudocode of our overall training algorithm is described in Algorithm 1. Algorithm 1 Our Joint Training Algorithm using Posterior Regularization
Open Source Code Yes Code and datasets available at https://github.com/awasthiabhijeet/Learning-From-Rules
Open Datasets Yes Question Classification (Li & Roth, 2002): This is a TREC-6 dataset... MIT-R1 (Liu et al., 2013): This is a slot-filling task... SMS Spam Classification (Almeida et al., 2011): This dataset contains 5.5k text messages... Youtube Spam Classification (Alberto et al., 2015): Here the task is to classify comments on You Tube videos as Spam or Not-Spam. We obtain this from Snorkel s Github page2... Census Income (Dua & Graff, 2019): This UCI dataset is extracted from the 1994 U.S. census. It lists a total of 13 features of an individual...
Dataset Splits Yes Table 1: Statistics of datasets and their rules. |Valid| |Test|... Question 68 4884 68 95 63.8 22.5 124 1.8 500 500... The training set has 5452 instances which are split as 68 for L, 500 for validation, and the remaining as U.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only discusses the type of networks and embeddings used.
Software Dependencies No As the embedding layer we use a pretrained ELMO (Peters et al., 2018) network... We use Adam optimizer... The input is passed through multiple non-linear layers with Re LU activation before passing through a Sigmoid activation which outputs the probability Pjφ(rj = 1|x)... For You Tube, the classifier network is a simple logistic regression like in Snorkel s code. The paper mentions software components but does not provide specific version numbers for reproducibility.
Experiment Setup Yes Each reported number is obtained by averaging over ten random initializations. Whenever a method involved hyper-parameters to weigh the relative contribution of various terms in the objective, we used a validation dataset to tune the value of the hyper-parameter. Hyperparameters used are provided in Section C of supplementary. Table 8: Hyperparameters for various methods and datasets. bs refers to the batch size and lr refers to the learning rate. For Only-L baseline smaller batch size was used considering the smaller size of L set.