Swapping Autoencoder for Deep Image Manipulation

Authors: Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei Efros, Richard Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
Researcher Affiliation Collaboration 1UC Berkeley 2Adobe Research
Pseudocode No The paper provides architectural diagrams and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions a 'project webpage' for a demo video and interactive UI, but does not explicitly state that source code for the methodology is released or provide a link to a code repository.
Open Datasets Yes For existing datasets, our model is trained on LSUN Churches, Bedrooms [80], Animal Faces HQ (AFHQ) [12], Flickr Faces HQ (FFHQ) [43], all at resolution of 256px except FFHQ at 1024px. In addition, we introduce new datasets, which are Portrait2FFHQ, a combined dataset of 17k portrait paintings from wikiart.org and FFHQ at 256px, Flickr Mountain, 0.5M mountain images from flickr. com, and Waterfall, of 90k 256px waterfall images.
Dataset Splits No The paper mentions training on various datasets (LSUN Churches, Bedrooms, AFHQ, FFHQ, Portrait2FFHQ, Flickr Mountain, Waterfall) but does not specify the training, validation, and test splits (e.g., percentages, sample counts, or explicit references to standard splits).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., exact GPU/CPU models, memory, or cloud computing specifications) used to run its experiments.
Software Dependencies No The paper does not provide specific details about ancillary software dependencies, such as programming languages, libraries, or frameworks with their version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1').
Experiment Setup Yes Our final objective function for the encoder and generator is Ltotal =Lrec+0.5LGAN,rec+0.5LGAN,swap+ LCooccur GAN. The discriminator objective and design follows Style GAN2 [44]. The encoder consists of 4 downsampling Res Net [22] blocks to produce the tensor zs, and a dense layer after average pooling to produce the vector zt. Please see Appendix ?? for a detailed specification of the architecture, as well as details of the discriminator loss function.