Distilling GANs with Style-Mixed Triplets for X2I Translation with Limited Data

Authors: Yaxing Wang, Joost van de weijer, Lu Yu, SHANGLING JUI

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results in a number of image generation tasks (i.e., image-to-image, semantic segmentation-to-image, text-to-image and audio-to-image) demonstrate qualitatively and quantitatively that our method successfully transfers knowledge to the synthetic image generation modules, resulting in more realistic images than previous methods as confirmed by a significant drop in the FID.
Researcher Affiliation Collaboration Yaxing Wang1,2, Joost van de Weijer2, Lu Yu3 , Shangling Jui4 1 College of Computer Science, Nankai University, China 2 Computer Vision Center, Universitat Aut onoma de Barcelona, Spain 3 School of Computer Science and Engineering, Tianjin University of Technology, China 4 Huawei Kirin Solution, China
Pseudocode No The paper describes its methods using prose, diagrams (Figure 2), and mathematical equations, but does not include any explicitly labeled ‘Pseudocode’ or ‘Algorithm’ blocks.
Open Source Code Yes Code is available in https://github.com/yaxingwang/KDIT.
Open Datasets Yes We conduct multi-class I2I translation on three datasets: Animal faces (Liu et al., 2019), Birds (Van Horn et al., 2015) and Foods (Kawano & Yanai, 2014). We evaluate the proposed method on the CUB bird dataset (Welinder et al., 2010). Oxford-102 (Nilsback & Zisserman, 2008) Cele AMask-HQ (Lee et al., 2020a) dataset.
Dataset Splits Yes Text-to-image... Here we use 10 images per class for training, and verify our method on the test dataset. Audio-to-image... 82 categories and 10 images per category are selected for training, and 20 categories and 1,155 images for test. Semantic segmentation-to-image... We randomly select 500 pairs of data from the train set for training, and 2,000 pairs for testing. In cat2dog-200, the training set is composed of 200 images (100 images/per class) and the test set has 200 images (100 images/per class) and the test set has 200 images (100 images/per class). In AFHQ-500, the training set is composed of 500 images (100 images/per class) and the test set has 1500 images (500 images/per class).
Hardware Specification Yes We perform the knowledge distillation on one GPU(Quadro RTX6000) with 24GB VRAM.
Software Dependencies No The proposed method is implemented in Pytorch (Paszke et al., 2017) and uses Adam (Kingma & Ba, 2014). While these indicate software used, they do not provide specific version numbers for PyTorch or Adam, only citations to their original papers.
Experiment Setup Yes We optimize the model using Adam (Kingma & Ba, 2014) with batch size of 16. The learning rates of the generator and the discriminator are set as 0.0001 and 0.0004 with exponential decay rates of (β1, β2) = (0.0, 0.9). The model is trained for 300 epochs for knowledge distillation. In Eq. 6 both αl and β are identical. For the specific features of which the dimension is less than 128, we set them 0.1. In other case they are 0.01. In Eq. 8 λadv and λkdl are 1, and λsrl is 0.1.