Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing
Authors: Siyi Chen, Huijie Zhang, Minzhe Guo, Yifu Lu, Peng Wang, Qing Qu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, extensive empirical experiments demonstrate the effectiveness and efficiency of LOCO Edit. The code and the ar Xiv version can be found on the project website.1 |
| Researcher Affiliation | Academia | Siyi Chen1 Huijie Zhang1 Minzhe Guo1 Yifu Lu1 Peng Wang1 Qing Qu1 1University of Michigan {siyich,huijiezh,vincegmz,yifulu,pengwa,qingqu}@umich.edu |
| Pseudocode | Yes | Algorithm 1 Unsupervised LOCO Edit |
| Open Source Code | Yes | The code and the ar Xiv version can be found on the project website.1 1https://chicychen.github.io/LOCO |
| Open Datasets | Yes | We evaluated DDPM (U-Net [49]) on CIFAR-10 dataset [50], U-Vi T [51] (Transformer) on Celeb A [52], Image Net [53] datasets and Deep Floy IF [19] trained on LAION-5B [54] dataset. |
| Dataset Splits | No | The paper lists datasets used for experiments (e.g., CIFAR-10, Celeb A), but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or counts) or reference predefined splits for reproducibility. |
| Hardware Specification | Yes | All the experiments can be conducted with a single A40 GPU having 48G memory. |
| Software Dependencies | No | The paper refers to specific models and architectures (e.g., U-Net [49], U-Vi T [51]), but does not specify version numbers for general software dependencies like Python, PyTorch, or other relevant libraries. |
| Experiment Setup | Yes | We empirically choose the edit time step t for different datasets in the range [0.5, 0.8]. In practice, we found time steps within the above range give similar editing results. In most of the experiments, the edit time steps chosen are: 0.5 for FFHQ, 0.6 for Celeba A-HQ and LSUN-church, 0.7 for AFHQ, Flowers, and Met Face. In practice, we choose the edit strength λ in the range of [ -15.0, 15.0], where a larger α leads to stronger semantic editing and a negative α leads to the change of semantics in the opposite direction. |