Creating Training Sets via Weak Indirect Supervision

Authors: Jieyu Zhang, Bohan Wang, Xiangchen Song, Yujing Wang, Yaming Yang, Jing Bai, Alexander Ratner

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On both image and text classification tasks as well as an industrial advertising application, we demonstrate the advantages of PLRM by outperforming baselines by a margin of 2%-9%.7 EXPERIMENTS
Researcher Affiliation Collaboration 1Microsoft Research Asia 2University of Washington 3University of Science and Technology of China 4Carnegie Mellon University 5Snorkel AI, Inc.
Pseudocode Yes Algorithm 1 WIS
Open Source Code No Our code will be released upon the acceptance.
Open Datasets Yes We demonstrate the applicability and performance of our method on image classification tasks derived from ILSVRC2012 (Russakovsky et al., 2015) and text classification tasks derived from LSHTC-3 (Partalas et al., 2015).
Dataset Splits No We sample data belonging to unseen classes for our experiments and split them into train and test set.
Hardware Specification Yes All experiments ran on a machine with an Intel(R) Xeon(R) CPU E5-2678 v3 with a 512G memory and a Ge Force GTX 1080Ti-11GB GPU.
Software Dependencies No All the code was implemented in Python. We use the standard implementation of the logistic regression model from Python scikit-learn library5 and the Res Net model from torchvision library6. Version numbers for these software components are not specified.
Experiment Setup Yes For the training of PGMs, we set the learning rate to be 1/n where n is the number of training data. For training logistic regression model, we use the default parameters in scikit-learn library. For training Res Net model, we set batch size as 256 and use Adam optimizer with learning rate being 1e-3 and weight decay being 5e-5.