End-to-End Weak Supervision

Authors: Salva Rühling Cachay, Benedikt Boecking, Artur Dubrawski

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources. We show that our method outperforms, by as much as 6.1 F1 points, state-of-the-art latent label modeling approaches on 4 out of 5 relevant benchmark datasets, and achieves state-of-the-art performance on a crowdsourcing dataset against methods specifically designed for this setting. 4 Experiments: We evaluate the proposed end-to-end system for learning a downstream model from multiple weak supervision sources on previously used benchmark datasets in weak supervision work [31, 7, 11]. Specifically, we evaluate test set performance on the following classification datasets:
Researcher Affiliation Academia Salva Rühling Cachay1,2 Benedikt Boecking1 Artur Dubrawski1 1 Carnegie Mellon University 2 Technical University of Darmstadt
Pseudocode Yes Algorithm 1 Wea SEL: The proposed Weakly Supervised End-to-end Learning algorithm for learning from multiple weak supervision sources. input: batch size n, networks e, f, inverse temperatures τ1, τ2, noise-aware loss function L, class balance P(y). for sampled minibatch {z(k) = (x(k), λ(k))}n k=1 do for all k {1, . . . , n} do # Produce accuracy scores for all weak sources θ z(k) = softmax e(z(k))τ1 # Generate probabilistic labels define s(k) as s(k) = θ(z(k))T λ(k) y(k) e = Pθ(y| λ(k)) = softmax s(k)τ2 P(y) # Downstream model forward pass y(k) f = f(x(k)) end for Lf = 1 n Pn k=1 L y(k) f , stop-grad y(k) e Le = 1 n Pn k=1 L y(k) e , stop-grad y(k) f update e to minimize Le, and f to minimize Lf end for return downstream network f( )
Open Source Code Yes We release an open-source, end-to-end system for arbitrary Py Torch downstream models that will allow practitioners to take advantage of our approach2. 2https://github.com/autonlab/weasel
Open Datasets Yes We evaluate the proposed end-to-end system for learning a downstream model from multiple weak supervision sources on previously used benchmark datasets in weak supervision work [31, 7, 11]. Specifically, we evaluate test set performance on the following classification datasets: The IMDB movie review dataset [28] contains movie reviews to be classified into positive and negative sentiment. A subset of the Amazon review dataset [24], where the task is to classify product reviews into positive and negative sentiment. We use the Bias Bios biographies dataset [16] to distinguish between binary categories of frequently occurring occupations and use the same subset of professor vs teacher classification as in [7]. Finally, we use the highly unbalanced Spouse dataset (90% negative class labels), where the task is to identify mentions of spouse relationships amongst a set of news articles from the Signal Media Dataset [13].
Dataset Splits Yes For the Spouse dataset, the same data split and LFs as in [19] are used, while for the rest we take a small subset of the test set as validation set. This is common practice in the related work [31, 32, 19, 7] for tuning hyperparameters, and allows for a fair comparison of models. Table 3: Dataset details, where training, validation and test set sizes are Ntrain, Nval, Ntest respectively
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions general computing environments like "on a GPU" implicitly through PyTorch.
Software Dependencies No The paper mentions "arbitrary Py Torch downstream models" and training with "cross-entropy loss", but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes All downstream models are trained with the (binary) cross-entropy loss, and our model with the symmetric version of it that uses stop-grad on the targets. More details, especially hyperparameter and architecture details, are provided in Appendix C.