AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks
Authors: Yonggan Fu, Wuyang Chen, Haotao Wang, Haoran Li, Yingyan Lin, Zhangyang Wang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate AGD in two representative GAN tasks: image translation and super resolution. |
| Researcher Affiliation | Academia | 1Rice University, Houston, Texas, USA 2Texas A&M University, College Station, Texas, USA. |
| Pseudocode | Yes | Algorithm 1 The Proposed Auto GAN-Distiller Framework |
| Open Source Code | Yes | Our codes and pretrained models are available at: https:// github.com/TAMU-VITA/AGD. |
| Open Datasets | Yes | We apply AGD on compressing Cycle GAN (Zhu et al., 2017) and consider two datasets, horse2zebra (Zhu et al., 2017) and summer2winter (Zhu et al., 2017). We apply AGD on compressing ESRGAN (Wang et al., 2018a) on a combined dataset of DIV2K and Flickr2K (Timofte et al., 2017) |
| Dataset Splits | Yes | We split the training dataset into two halves: one for updating supernet weight and the other for updating architecture parameters. |
| Hardware Specification | Yes | For the efficiency aspect, we measure the model size and the inference FLOPs (floating-point operations). As both might not always be aligned with the hardware performance, we further measure the real-device inference latency using NVIDIA GEFORCE RTX 2080 Ti (NVIDIA Inc.). |
| Software Dependencies | No | The paper discusses optimizers (SGD, Adam) and loss functions but does not specify software platforms (e.g., PyTorch, TensorFlow) or their version numbers, nor other ancillary software dependencies. |
| Experiment Setup | Yes | For AGD on Cycle GAN, λ in Eq. 1 is 1 10-17, ω1 and ω2 in Eq. 2 are set to 1/4 and 3/4, and β1, β2 and β3 in Eq. 3 are set to be 1 10-2, 1, and 5 10-8, respectively. We pretrain and search for 50 epochs, with batch size 2. We use an SGD optimizer with a momentum of 0.9 and the initial learning rate 1 10-1 for the weights, which linearly decays to 0 after 10 epochs, and an Adam optimizer with a constant 3 10-4 learning rate for architecture parameters. We train the searched architecture from scratch for 400 epochs, with a batch size of 16 and an initial learning rate of 1 10-1, which linearly decays to 0 after 100 epochs. |