Exploring Diffusion Time-steps for Unsupervised Representation Learning

Authors: Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On Celeb A, FFHQ, and Bedroom datasets, the learned feature significantly improves attribute classification and enables faithful counterfactual generation, e.g., interpolating only one specified attribute between two images, validating the disentanglement quality. Codes are in https://github.com/yue-zhongqi/diti.
Researcher Affiliation Collaboration 1Nanyang Technological University, 2Singapore Management University,3Microsoft Research Asia,4Skywork AI
Pseudocode No The paper describes its proposed approach and implementation details in prose and equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Codes are in https://github.com/yue-zhongqi/diti.
Open Datasets Yes Datasets. We choose real-world datasets to validate if Di Ti learns a disentangled representation of the generative attributes: 1) Celebrity Faces Attributes (Celeb A) Liu et al. (2015) is a large-scale face attributes dataset. ... 2) Flickr-Faces-HQ (FFHQ) Karras et al. (2019) contains 70,000 high-quality face images obtained from Flickr. 3) We additionally used the Labeled Faces in the Wild (LFW) dataset Huang et al. (2007) that provides continuous attribute labels. 4) Bedroom is part of the Large-scale Scene UNderstanding (LSUN) dataset Yu et al. (2015) that contains around 3 million images.
Dataset Splits No The paper mentions 'Celeb A train split' and 'Celeb A test split' but does not explicitly state a separate validation split or specific details for a validation set.
Hardware Specification Yes Our experiments were performed on 4 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software components like 'U-Net' and 'pre-trained DM', but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We followed the network design of encoder f and decoder g in PDAE and adopted its hyper-parameter settings (e.g., λt, wt in Eq. 4, details in Appendix). This ensures that any emerged property of disentangled representation is solely from our leverage of the inductive bias in Section 4.1. We also used the same training iterations as PDAE, i.e., 290k iterations on Celeb A, 500k iterations on FFHQ, and 540k iterations on Bedroom.