reproducibilityindex.ai

Creating Training Sets via Weak Indirect Supervision

Authors: Jieyu Zhang, Bohan Wang, Xiangchen Song, Yujing Wang, Yaming Yang, Jing Bai, Alexander Ratner

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On both image and text classification tasks as well as an industrial advertising application, we demonstrate the advantages of PLRM by outperforming baselines by a margin of 2%-9%.7 EXPERIMENTS
Researcher Affiliation	Collaboration	1Microsoft Research Asia 2University of Washington 3University of Science and Technology of China 4Carnegie Mellon University 5Snorkel AI, Inc.
Pseudocode	Yes	Algorithm 1 WIS
Open Source Code	No	Our code will be released upon the acceptance.
Open Datasets	Yes	We demonstrate the applicability and performance of our method on image classification tasks derived from ILSVRC2012 (Russakovsky et al., 2015) and text classification tasks derived from LSHTC-3 (Partalas et al., 2015).
Dataset Splits	No	We sample data belonging to unseen classes for our experiments and split them into train and test set.
Hardware Specification	Yes	All experiments ran on a machine with an Intel(R) Xeon(R) CPU E5-2678 v3 with a 512G memory and a Ge Force GTX 1080Ti-11GB GPU.
Software Dependencies	No	All the code was implemented in Python. We use the standard implementation of the logistic regression model from Python scikit-learn library5 and the Res Net model from torchvision library6. Version numbers for these software components are not specified.
Experiment Setup	Yes	For the training of PGMs, we set the learning rate to be 1/n where n is the number of training data. For training logistic regression model, we use the default parameters in scikit-learn library. For training Res Net model, we set batch size as 256 and use Adam optimizer with learning rate being 1e-3 and weight decay being 5e-5.