Diffusion Models Already Have A Semantic Latent Space

Authors: Mingi Kwon, Jaeseok Jeong, Youngjung Uh

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS In this section, we show the effectiveness of semantic latent editing in h-space with Asyrp on various attributes, datasets and architectures in 5.1. Moreover, we provide quantitative results including user study in 5.2.
Researcher Affiliation Academia Mingi Kwon, Jaeseok Jeong, Youngjung Uh Department of Artificial Intelligence Yonsei University Seoul, Republic of Korea {kwonmingi,jete jeong,yj.uh}@yonsei.ac.kr
Pseudocode Yes Algorithm 1: Editing(Inference) and Algorithm 2: Training Neural implicit function ft (in Appendix I)
Open Source Code Yes The code is available at https://github.com/kwonminki/Asyrp official
Open Datasets Yes Celeb A-HQ (Karras et al., 2018) and LSUN-bedroom/-church (Yu et al., 2015) on DDPM++ (Song et al., 2020b) (Meng et al., 2021); AFHQ-dog (Choi et al., 2020) on i DDPM (Nichol & Dhariwal, 2021); and METFACES (Karras et al., 2020) on ADM with P2-weighting (Dhariwal & Nichol, 2021) (Choi et al., 2022).
Dataset Splits No We train ft with S = 40 for 1 epoch using 1000 samples. The real samples are randomly chosen from each dataset for in-domain-like attributes. For out-of-domainlike attributes, we randomly draw 1,000 latent variables x T N(0, I). (No explicit train/validation/test splits are mentioned for reproduction, only the total samples used for training ft).
Hardware Specification Yes Training takes about 20 minutes with three RTX 3090 GPUs.
Software Dependencies No The paper mentions software components and models like CLIP and U-Net but does not provide specific version numbers for any software dependencies required for reproducibility.
Experiment Setup Yes We train ft with S = 40 for 1 epoch using 1000 samples. The real samples are randomly chosen from each dataset for in-domain-like attributes. For out-of-domainlike attributes, we randomly draw 1,000 latent variables x T N(0, I). Detailed settings including the coefficients for λCLIP and λrecon, and source/target descriptions can be found in Appendix J.1.