Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Creating Training Sets via Weak Indirect Supervision
Authors: Jieyu Zhang, Bohan Wang, Xiangchen Song, Yujing Wang, Yaming Yang, Jing Bai, Alexander Ratner
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On both image and text classification tasks as well as an industrial advertising application, we demonstrate the advantages of PLRM by outperforming baselines by a margin of 2%-9%.7 EXPERIMENTS |
| Researcher Affiliation | Collaboration | 1Microsoft Research Asia 2University of Washington 3University of Science and Technology of China 4Carnegie Mellon University 5Snorkel AI, Inc. |
| Pseudocode | Yes | Algorithm 1 WIS |
| Open Source Code | No | Our code will be released upon the acceptance. |
| Open Datasets | Yes | We demonstrate the applicability and performance of our method on image classification tasks derived from ILSVRC2012 (Russakovsky et al., 2015) and text classification tasks derived from LSHTC-3 (Partalas et al., 2015). |
| Dataset Splits | No | We sample data belonging to unseen classes for our experiments and split them into train and test set. |
| Hardware Specification | Yes | All experiments ran on a machine with an Intel(R) Xeon(R) CPU E5-2678 v3 with a 512G memory and a Ge Force GTX 1080Ti-11GB GPU. |
| Software Dependencies | No | All the code was implemented in Python. We use the standard implementation of the logistic regression model from Python scikit-learn library5 and the Res Net model from torchvision library6. Version numbers for these software components are not specified. |
| Experiment Setup | Yes | For the training of PGMs, we set the learning rate to be 1/n where n is the number of training data. For training logistic regression model, we use the default parameters in scikit-learn library. For training Res Net model, we set batch size as 256 and use Adam optimizer with learning rate being 1e-3 and weight decay being 5e-5. |