Generative Modeling Helps Weak Supervision (and Vice Versa)

Authors: Benedikt Boecking, Nicholas Roberts, Willie Neiswanger, Stefano Ermon, Frederic Sala, Artur Dubrawski

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on multiple image datasets show that the proposed WSGAN approach is able to take advantage of the discrete latent structure it discovers in the images, leading to better label model performance compared to prior work. The results also indicate that weak supervision as used by WSGAN can improve image generation performance.
Researcher Affiliation Academia Benedikt Boecking Nicholas Roberts Willie Neiswanger Stefano Ermon Frederic Sala Artur Dubrawski Carnegie Mellon University University of Wisconsin Stanford University boecking@cmu.edu, awd@cmu.edu nick11roberts@cs.wisc.edu, fredsala@cs.wisc.edu neiswanger@cs.stanford.edu, ermon@cs.stanford.edu
Pseudocode Yes Pseudocode for the added loss term can be found in Algorithm 1.
Open Source Code Yes Code for WSGAN can be found at https://github.com/benbo/WSGAN-paper.
Open Datasets Yes We conduct our main experiments with the Animals with Attributes 2 (Aw A2) (Mazzetto et al., 2021a), Domain Net (Peng et al., 2019), the German Traffic Sign Recognition Benchmark (GTSRB) (Stallkamp et al., 2012), and CIFAR10 (Krizhevsky, 2009) color image datasets, as well as with the gray-scale MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017) datasets.
Dataset Splits Yes We create an 85%/5%/10% train/validation/test split of the 10 unseen classes which we use to define decision trees to produce weak labels for a set of unseen classes of animals.
Hardware Specification Yes Info GAN: 14bps, WSGAN Encoder: 7.8bps, WSGAN Vector: 8.6bps (NVIDIA RTX A6000, batch size 16).
Software Dependencies No The paper mentions software components like 'Adam' (optimizer) and using 'WRENCH' for label models, but it does not specify version numbers for these or other software dependencies like Python or PyTorch libraries.
Experiment Setup Yes We use the same hyperparameter settings for all datasets. We train all GANs for a maximum of 200 epochs. We use a batch size of 16... For WSGAN, we use four optimizers, one for each of the different loss terms... We use Adam for all optimizers and set the learning rates as follows: 4e-4 for D, 1e-4 for G, 1e-4 for the info loss term, and 8e-5 for the WSGAN loss term.