AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks

Authors: Yonggan Fu, Wuyang Chen, Haotao Wang, Haoran Li, Yingyan Lin, Zhangyang Wang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AGD in two representative GAN tasks: image translation and super resolution.
Researcher Affiliation Academia 1Rice University, Houston, Texas, USA 2Texas A&M University, College Station, Texas, USA.
Pseudocode Yes Algorithm 1 The Proposed Auto GAN-Distiller Framework
Open Source Code Yes Our codes and pretrained models are available at: https:// github.com/TAMU-VITA/AGD.
Open Datasets Yes We apply AGD on compressing Cycle GAN (Zhu et al., 2017) and consider two datasets, horse2zebra (Zhu et al., 2017) and summer2winter (Zhu et al., 2017). We apply AGD on compressing ESRGAN (Wang et al., 2018a) on a combined dataset of DIV2K and Flickr2K (Timofte et al., 2017)
Dataset Splits Yes We split the training dataset into two halves: one for updating supernet weight and the other for updating architecture parameters.
Hardware Specification Yes For the efficiency aspect, we measure the model size and the inference FLOPs (floating-point operations). As both might not always be aligned with the hardware performance, we further measure the real-device inference latency using NVIDIA GEFORCE RTX 2080 Ti (NVIDIA Inc.).
Software Dependencies No The paper discusses optimizers (SGD, Adam) and loss functions but does not specify software platforms (e.g., PyTorch, TensorFlow) or their version numbers, nor other ancillary software dependencies.
Experiment Setup Yes For AGD on Cycle GAN, λ in Eq. 1 is 1 10-17, ω1 and ω2 in Eq. 2 are set to 1/4 and 3/4, and β1, β2 and β3 in Eq. 3 are set to be 1 10-2, 1, and 5 10-8, respectively. We pretrain and search for 50 epochs, with batch size 2. We use an SGD optimizer with a momentum of 0.9 and the initial learning rate 1 10-1 for the weights, which linearly decays to 0 after 10 epochs, and an Adam optimizer with a constant 3 10-4 learning rate for architecture parameters. We train the searched architecture from scratch for 400 epochs, with a batch size of 16 and an initial learning rate of 1 10-1, which linearly decays to 0 after 100 epochs.