Distilling GANs with Style-Mixed Triplets for X2I Translation with Limited Data
Authors: Yaxing Wang, Joost van de weijer, Lu Yu, SHANGLING JUI
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results in a number of image generation tasks (i.e., image-to-image, semantic segmentation-to-image, text-to-image and audio-to-image) demonstrate qualitatively and quantitatively that our method successfully transfers knowledge to the synthetic image generation modules, resulting in more realistic images than previous methods as confirmed by a significant drop in the FID. |
| Researcher Affiliation | Collaboration | Yaxing Wang1,2, Joost van de Weijer2, Lu Yu3 , Shangling Jui4 1 College of Computer Science, Nankai University, China 2 Computer Vision Center, Universitat Aut onoma de Barcelona, Spain 3 School of Computer Science and Engineering, Tianjin University of Technology, China 4 Huawei Kirin Solution, China |
| Pseudocode | No | The paper describes its methods using prose, diagrams (Figure 2), and mathematical equations, but does not include any explicitly labeled ‘Pseudocode’ or ‘Algorithm’ blocks. |
| Open Source Code | Yes | Code is available in https://github.com/yaxingwang/KDIT. |
| Open Datasets | Yes | We conduct multi-class I2I translation on three datasets: Animal faces (Liu et al., 2019), Birds (Van Horn et al., 2015) and Foods (Kawano & Yanai, 2014). We evaluate the proposed method on the CUB bird dataset (Welinder et al., 2010). Oxford-102 (Nilsback & Zisserman, 2008) Cele AMask-HQ (Lee et al., 2020a) dataset. |
| Dataset Splits | Yes | Text-to-image... Here we use 10 images per class for training, and verify our method on the test dataset. Audio-to-image... 82 categories and 10 images per category are selected for training, and 20 categories and 1,155 images for test. Semantic segmentation-to-image... We randomly select 500 pairs of data from the train set for training, and 2,000 pairs for testing. In cat2dog-200, the training set is composed of 200 images (100 images/per class) and the test set has 200 images (100 images/per class) and the test set has 200 images (100 images/per class). In AFHQ-500, the training set is composed of 500 images (100 images/per class) and the test set has 1500 images (500 images/per class). |
| Hardware Specification | Yes | We perform the knowledge distillation on one GPU(Quadro RTX6000) with 24GB VRAM. |
| Software Dependencies | No | The proposed method is implemented in Pytorch (Paszke et al., 2017) and uses Adam (Kingma & Ba, 2014). While these indicate software used, they do not provide specific version numbers for PyTorch or Adam, only citations to their original papers. |
| Experiment Setup | Yes | We optimize the model using Adam (Kingma & Ba, 2014) with batch size of 16. The learning rates of the generator and the discriminator are set as 0.0001 and 0.0004 with exponential decay rates of (β1, β2) = (0.0, 0.9). The model is trained for 300 epochs for knowledge distillation. In Eq. 6 both αl and β are identical. For the specific features of which the dimension is less than 128, we set them 0.1. In other case they are 0.01. In Eq. 8 λadv and λkdl are 1, and λsrl is 0.1. |