Few-shot Cross-domain Image Generation via Inference-time Latent-code Learning

Authors: Arnab Kumar Mondal, Piyush Tiwary, Parag Singla, Prathosh AP

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS AND RESULTS
Researcher Affiliation Academia * IIT Delhi, IISc Benglauru
Pseudocode Yes Algorithm 1: Proposed Method s Pseudo Code
Open Source Code Yes The code of the proposed method is available at: https://github.com/arnabkmondal/Gen DA
Open Datasets Yes Following previous work (Li et al., 2020; Ojha et al., 2021), we consider Flickr Faces HQ (FFHQ) (Karras et al., 2019) as one of the source domain datasets and adapt to the following target domains (i) FFHQ-Babies (Ojha et al., 2021), (ii) FFHQ-Sunglasses (Ojha et al., 2021), (iii) face sketches (Wang & Tang, 2009), (iv) emoji faces from bitmoji.com API (Taigman et al., 2016; Hua et al., 2017), and (v) portrait paintings from the artistic faces dataset (Yaniv et al., 2019). Next, we consider LSUN Church (Yu et al., 2015) as source domain and adapt to (i) haunted houses (Ojha et al., 2021), and (ii) Van Gogh s house paintings (Ojha et al., 2021).
Dataset Splits No Even though very few (1, 5, or 10) target examples are used in the method for adaptation, the evaluation is conducted on a larger target set. For example, there are approximately 300, 2500, 2700 and unlimited examples in the sketches (Wang & Tang, 2009), FFHQ-Babies (Ojha et al., 2021), FFHQ-Sunglasses (Ojha et al., 2021), and emoji (Taigman et al., 2016; Hua et al., 2017) datasets, respectively. In such settings, we use the entire target dataset (10000 samples for emoji) for evaluation purposes by generating so many targets using our method.
Hardware Specification No We thank IIT Delhi HPC facility2 for computational resources.
Software Dependencies No No specific software dependencies with version numbers are mentioned.
Experiment Setup Yes The latent learner is a 3-layer MLP with 512 neurons in each layer throughout all the experiments unless otherwise specified. The hidden layers employ Re LU activation, and the final layer has no activation. All the few-shot target samples are used in a single batch (of size 8) for computing the style and adversarial losses.