Exploring Diffusion Time-steps for Unsupervised Representation Learning
Authors: Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On Celeb A, FFHQ, and Bedroom datasets, the learned feature significantly improves attribute classification and enables faithful counterfactual generation, e.g., interpolating only one specified attribute between two images, validating the disentanglement quality. Codes are in https://github.com/yue-zhongqi/diti. |
| Researcher Affiliation | Collaboration | 1Nanyang Technological University, 2Singapore Management University,3Microsoft Research Asia,4Skywork AI |
| Pseudocode | No | The paper describes its proposed approach and implementation details in prose and equations, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are in https://github.com/yue-zhongqi/diti. |
| Open Datasets | Yes | Datasets. We choose real-world datasets to validate if Di Ti learns a disentangled representation of the generative attributes: 1) Celebrity Faces Attributes (Celeb A) Liu et al. (2015) is a large-scale face attributes dataset. ... 2) Flickr-Faces-HQ (FFHQ) Karras et al. (2019) contains 70,000 high-quality face images obtained from Flickr. 3) We additionally used the Labeled Faces in the Wild (LFW) dataset Huang et al. (2007) that provides continuous attribute labels. 4) Bedroom is part of the Large-scale Scene UNderstanding (LSUN) dataset Yu et al. (2015) that contains around 3 million images. |
| Dataset Splits | No | The paper mentions 'Celeb A train split' and 'Celeb A test split' but does not explicitly state a separate validation split or specific details for a validation set. |
| Hardware Specification | Yes | Our experiments were performed on 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'U-Net' and 'pre-trained DM', but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We followed the network design of encoder f and decoder g in PDAE and adopted its hyper-parameter settings (e.g., λt, wt in Eq. 4, details in Appendix). This ensures that any emerged property of disentangled representation is solely from our leverage of the inductive bias in Section 4.1. We also used the same training iterations as PDAE, i.e., 290k iterations on Celeb A, 500k iterations on FFHQ, and 540k iterations on Bedroom. |