Winning Lottery Tickets in Deep Generative Models

Authors: Neha Mukund Kalibhat, Yogesh Balaji, Soheil Feizi8038-8046

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we confirm the existence of winning tickets in deep generative models such as GANs and VAEs. We show that the popular iterative magnitude pruning approach (with late resetting) can be used with generative losses to find the winning tickets. This approach effectively yields tickets with sparsity up to 99% for Auto Encoders, 93% for VAEs and 89% for GANs on CIFAR and Celeb-A datasets. We also demonstrate the transferability of winning tickets across different generative models (GANs and VAEs) sharing the same architecture, suggesting that winning tickets have inductive biases that could help train a wide range of deep generative models. Furthermore, we show the practical benefits of lottery tickets in generative models by detecting tickets at very early stages in training called early-bird tickets. Through early-bird tickets, we can achieve up to 88% reduction in floating-point operations (FLOPs) and 54% reduction in training time, making it possible to train large-scale generative models over tight resource constraints. These results out-perform existing early pruning methods like SNIP (Lee, Ajanthan, and Torr 2019) and Gra SP (Wang, Zhang, and Grosse 2020).
Researcher Affiliation Academia Neha Mukund Kalibhat, Yogesh Balaji, Soheil Feizi Department of Computer Science, University of Maryland College Park {nehamk,yogesh,sfeizi}@cs.umd.edu
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found.
Open Source Code No The paper does not include an explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We conduct experiments on several generative models including linear Auto Encoder, convolutional Auto Encoder, VAE, β-VAE (Higgins et al. 2017), Res Net-VAE (Kingma et al. 2016), Deep-Convolutional GAN (DCGAN) (Radford, Metz, and Chintala 2015), Spectral Normalization GAN (SNGAN) (Miyato et al. 2018), Wasserstein GAN (WGAN) (Arjovsky, Chintala, and Bottou 2017) and Res Net-GAN (He et al. 2016) on MNIST (Le Cun, Cortes, and Burges 2010), CIFAR-10 (Krizhevsky 2009) and Celeb-A (Liu et al. 2015) datasets.
Dataset Splits No No specific details on training, validation, or test dataset splits (e.g., percentages or counts) were explicitly provided. The paper mentions using standard datasets like MNIST, CIFAR-10, and Celeb-A, but does not specify how these were split for training, validation, and testing.
Hardware Specification No No specific hardware details (such as GPU or CPU models, memory specifications) used for running the experiments were provided. The paper mentions 'Powerful GANs that perform such tasks on large-scale datasets such as Image Net (Deng et al. 2009), require TPUs of 128 to 512 cores to generative high quality images.' in the Broader Impact section, but these refer to general requirements for large-scale GANs, not the specific setup for their own experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names with versions like PyTorch 1.9 or Python 3.8) were provided.
Experiment Setup Yes In all our experiments in this paper, p = 20% and n = 20, i.e. we run 20 rounds of iterative magnitude pruning where we prune 20% of the network at each iteration. ... In our experiments, we look-back 5 iterations and fix δ as 0.1. These hyper-parameters generally help us find stable EB-tickets very early in training at epoch 4 to 6. We also perform mixed-precision training on EB-tickets, where the floating-point precision of parameters and their gradients are reduced from 32-bit to 16-bit or 8-bit depending on their sizes.