Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning

Authors: Ancong Wu, Wei-Shi Zheng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Shapes3d, MPI3D and Cars3d show that our method achieves advanced performance and can generate visually interpretable concept-specific masks.
Researcher Affiliation Academia 1 School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2 Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China 3 Guangdong Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou, China
Pseudocode No The paper describes the model architecture and training process in textual format, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Source code and supplementary materials are available at https://github.com/wuancong/FDAE.
Open Datasets Yes We evaluated unsupervised disentanglement representation learning on Shapes3d (Kim and Mnih 2018), MPI3D (Gondal et al. 2019), Cars3d (Reed et al. 2015) and attribute prediction on complex real-world dataset Market1501 (Zheng et al. 2015).
Dataset Splits No The paper describes hyperparameter selection based on self-MIG but does not explicitly state using a distinct 'validation set' with specific proportions for training/test/validation dataset splits.
Hardware Specification Yes The training process takes 21 hours on 1 NVIDIA RTX 3090.
Software Dependencies No The paper mentions software components like RAdam, U-Net, EDM, K-means, and PCA, but does not provide specific version numbers for any of them.
Experiment Setup Yes Input image x was resized to 64 64. Dimensionalities of the content codes (dc), mask codes (dm) and content masks (d F ) were all set to 80... In our loss function, we set w CD = 2.5 10 5 for content decorrelation loss LCD in Eq. (10) and set w ME = 1.0 10 4 for mask entropy loss LME in Eq. (11). For optimization, we used RAdam (Liu et al. 2020) with learning rate 1.0 10 4 for 100, 000 iterations and the batch size was set to 32.