reproducibilityindex.ai

End-to-End Weak Supervision

Authors: Salva Rühling Cachay, Benedikt Boecking, Artur Dubrawski

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources. We show that our method outperforms, by as much as 6.1 F1 points, state-of-the-art latent label modeling approaches on 4 out of 5 relevant benchmark datasets, and achieves state-of-the-art performance on a crowdsourcing dataset against methods speciﬁcally designed for this setting. 4 Experiments: We evaluate the proposed end-to-end system for learning a downstream model from multiple weak supervision sources on previously used benchmark datasets in weak supervision work [31, 7, 11]. Speciﬁcally, we evaluate test set performance on the following classiﬁcation datasets:
Researcher Affiliation	Academia	Salva Rühling Cachay1,2 Benedikt Boecking1 Artur Dubrawski1 1 Carnegie Mellon University 2 Technical University of Darmstadt
Pseudocode	Yes	Algorithm 1 Wea SEL: The proposed Weakly Supervised End-to-end Learning algorithm for learning from multiple weak supervision sources. input: batch size n, networks e, f, inverse temperatures τ1, τ2, noise-aware loss function L, class balance P(y). for sampled minibatch {z(k) = (x(k), λ(k))}n k=1 do for all k {1, . . . , n} do # Produce accuracy scores for all weak sources θ z(k) = softmax e(z(k))τ1 # Generate probabilistic labels deﬁne s(k) as s(k) = θ(z(k))T λ(k) y(k) e = Pθ(y\| λ(k)) = softmax s(k)τ2 P(y) # Downstream model forward pass y(k) f = f(x(k)) end for Lf = 1 n Pn k=1 L y(k) f , stop-grad y(k) e Le = 1 n Pn k=1 L y(k) e , stop-grad y(k) f update e to minimize Le, and f to minimize Lf end for return downstream network f( )
Open Source Code	Yes	We release an open-source, end-to-end system for arbitrary Py Torch downstream models that will allow practitioners to take advantage of our approach2. 2https://github.com/autonlab/weasel
Open Datasets	Yes	We evaluate the proposed end-to-end system for learning a downstream model from multiple weak supervision sources on previously used benchmark datasets in weak supervision work [31, 7, 11]. Speciﬁcally, we evaluate test set performance on the following classiﬁcation datasets: The IMDB movie review dataset [28] contains movie reviews to be classiﬁed into positive and negative sentiment. A subset of the Amazon review dataset [24], where the task is to classify product reviews into positive and negative sentiment. We use the Bias Bios biographies dataset [16] to distinguish between binary categories of frequently occurring occupations and use the same subset of professor vs teacher classiﬁcation as in [7]. Finally, we use the highly unbalanced Spouse dataset (90% negative class labels), where the task is to identify mentions of spouse relationships amongst a set of news articles from the Signal Media Dataset [13].
Dataset Splits	Yes	For the Spouse dataset, the same data split and LFs as in [19] are used, while for the rest we take a small subset of the test set as validation set. This is common practice in the related work [31, 32, 19, 7] for tuning hyperparameters, and allows for a fair comparison of models. Table 3: Dataset details, where training, validation and test set sizes are Ntrain, Nval, Ntest respectively
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions general computing environments like "on a GPU" implicitly through PyTorch.
Software Dependencies	No	The paper mentions "arbitrary Py Torch downstream models" and training with "cross-entropy loss", but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	All downstream models are trained with the (binary) cross-entropy loss, and our model with the symmetric version of it that uses stop-grad on the targets. More details, especially hyperparameter and architecture details, are provided in Appendix C.