Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning
Authors: Ancong Wu, Wei-Shi Zheng
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Shapes3d, MPI3D and Cars3d show that our method achieves advanced performance and can generate visually interpretable concept-specific masks. |
| Researcher Affiliation | Academia | 1 School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2 Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China 3 Guangdong Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou, China |
| Pseudocode | No | The paper describes the model architecture and training process in textual format, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code and supplementary materials are available at https://github.com/wuancong/FDAE. |
| Open Datasets | Yes | We evaluated unsupervised disentanglement representation learning on Shapes3d (Kim and Mnih 2018), MPI3D (Gondal et al. 2019), Cars3d (Reed et al. 2015) and attribute prediction on complex real-world dataset Market1501 (Zheng et al. 2015). |
| Dataset Splits | No | The paper describes hyperparameter selection based on self-MIG but does not explicitly state using a distinct 'validation set' with specific proportions for training/test/validation dataset splits. |
| Hardware Specification | Yes | The training process takes 21 hours on 1 NVIDIA RTX 3090. |
| Software Dependencies | No | The paper mentions software components like RAdam, U-Net, EDM, K-means, and PCA, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Input image x was resized to 64 64. Dimensionalities of the content codes (dc), mask codes (dm) and content masks (d F ) were all set to 80... In our loss function, we set w CD = 2.5 10 5 for content decorrelation loss LCD in Eq. (10) and set w ME = 1.0 10 4 for mask entropy loss LME in Eq. (11). For optimization, we used RAdam (Liu et al. 2020) with learning rate 1.0 10 4 for 100, 000 iterations and the batch size was set to 32. |