reproducibilityindex.ai

Learning from Rules Generalizing Labeled Exemplars

Authors: Abhijeet Awasthi, Sabyasachi Ghosh, Rasna Goyal, Sunita Sarawagi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation on ﬁve different tasks shows that (1) our algorithm is more accurate than several existing methods of learning from a mix of clean and noisy supervision, and (2) the coupled rule-exemplar supervision is effective in denoising rules.
Researcher Affiliation	Academia	Abhijeet Awasthi Sabyasachi Ghosh Rasna Goyal Sunita Sarawagi Department of Computer Science and Engineering Indian Instiute of Technology Bombay Mumbai, Maharashtra 400076, India
Pseudocode	Yes	A pseudocode of our overall training algorithm is described in Algorithm 1. Algorithm 1 Our Joint Training Algorithm using Posterior Regularization
Open Source Code	Yes	Code and datasets available at https://github.com/awasthiabhijeet/Learning-From-Rules
Open Datasets	Yes	Question Classiﬁcation (Li & Roth, 2002): This is a TREC-6 dataset... MIT-R1 (Liu et al., 2013): This is a slot-ﬁlling task... SMS Spam Classiﬁcation (Almeida et al., 2011): This dataset contains 5.5k text messages... Youtube Spam Classiﬁcation (Alberto et al., 2015): Here the task is to classify comments on You Tube videos as Spam or Not-Spam. We obtain this from Snorkel s Github page2... Census Income (Dua & Graff, 2019): This UCI dataset is extracted from the 1994 U.S. census. It lists a total of 13 features of an individual...
Dataset Splits	Yes	Table 1: Statistics of datasets and their rules. \|Valid\| \|Test\|... Question 68 4884 68 95 63.8 22.5 124 1.8 500 500... The training set has 5452 instances which are split as 68 for L, 500 for validation, and the remaining as U.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only discusses the type of networks and embeddings used.
Software Dependencies	No	As the embedding layer we use a pretrained ELMO (Peters et al., 2018) network... We use Adam optimizer... The input is passed through multiple non-linear layers with Re LU activation before passing through a Sigmoid activation which outputs the probability Pjφ(rj = 1\|x)... For You Tube, the classiﬁer network is a simple logistic regression like in Snorkel s code. The paper mentions software components but does not provide specific version numbers for reproducibility.
Experiment Setup	Yes	Each reported number is obtained by averaging over ten random initializations. Whenever a method involved hyper-parameters to weigh the relative contribution of various terms in the objective, we used a validation dataset to tune the value of the hyper-parameter. Hyperparameters used are provided in Section C of supplementary. Table 8: Hyperparameters for various methods and datasets. bs refers to the batch size and lr refers to the learning rate. For Only-L baseline smaller batch size was used considering the smaller size of L set.