Diffusion Models Already Have A Semantic Latent Space
Authors: Mingi Kwon, Jaeseok Jeong, Youngjung Uh
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS In this section, we show the effectiveness of semantic latent editing in h-space with Asyrp on various attributes, datasets and architectures in 5.1. Moreover, we provide quantitative results including user study in 5.2. |
| Researcher Affiliation | Academia | Mingi Kwon, Jaeseok Jeong, Youngjung Uh Department of Artificial Intelligence Yonsei University Seoul, Republic of Korea {kwonmingi,jete jeong,yj.uh}@yonsei.ac.kr |
| Pseudocode | Yes | Algorithm 1: Editing(Inference) and Algorithm 2: Training Neural implicit function ft (in Appendix I) |
| Open Source Code | Yes | The code is available at https://github.com/kwonminki/Asyrp official |
| Open Datasets | Yes | Celeb A-HQ (Karras et al., 2018) and LSUN-bedroom/-church (Yu et al., 2015) on DDPM++ (Song et al., 2020b) (Meng et al., 2021); AFHQ-dog (Choi et al., 2020) on i DDPM (Nichol & Dhariwal, 2021); and METFACES (Karras et al., 2020) on ADM with P2-weighting (Dhariwal & Nichol, 2021) (Choi et al., 2022). |
| Dataset Splits | No | We train ft with S = 40 for 1 epoch using 1000 samples. The real samples are randomly chosen from each dataset for in-domain-like attributes. For out-of-domainlike attributes, we randomly draw 1,000 latent variables x T N(0, I). (No explicit train/validation/test splits are mentioned for reproduction, only the total samples used for training ft). |
| Hardware Specification | Yes | Training takes about 20 minutes with three RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions software components and models like CLIP and U-Net but does not provide specific version numbers for any software dependencies required for reproducibility. |
| Experiment Setup | Yes | We train ft with S = 40 for 1 epoch using 1000 samples. The real samples are randomly chosen from each dataset for in-domain-like attributes. For out-of-domainlike attributes, we randomly draw 1,000 latent variables x T N(0, I). Detailed settings including the coefficients for λCLIP and λrecon, and source/target descriptions can be found in Appendix J.1. |