Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models
Authors: Zijian Zhang, Zhou Zhao, Zhijie Lin
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness, efficiency and flexibility of PDAE. Our implementation is available at https://github.com/ckczzj/PDAE. To compare PDAE with Diff-AE [36], we follow their experiments with the same settings. Moreover, we also show that PDAE enables some added features. For fair comparison, we use the baseline DPMs provided by official Diff-AE implementation as our pre-trained models (also as our baselines), which have the same network architectures (hyperparameters) with their Diff-AE models. For brevity, we use the notation such as "FFHQ128-130M-z512-64M" to name our model, which means that we use a baseline DPM pre-trained with 130M images and leverage it for PDAE training with 64M images, on 128 128 FFHQ dataset [21], with the semantic latent code z of 512-d. |
| Researcher Affiliation | Collaboration | Zijian Zhang1 Zhou Zhao1 Zhijie Lin2 1Department of Computer Science and Technology, Zhejiang University 2Sea AI Lab |
| Pseudocode | No | The paper states "We put detailed algorithm procedures in Appendix ??", but the pseudocode or algorithm blocks are not included in the provided text. |
| Open Source Code | Yes | Our implementation is available at https://github.com/ckczzj/PDAE. |
| Open Datasets | Yes | Specifically, we train an unconditional DPM and a noisy classifier on MNIST [28]... on 128 128 FFHQ dataset [21]... We use "FFHQ128-130M-z512-64M" to encode-andreconstruct all 30k images of Celeb A-HQ [20]... train a model of "Image Net64-77M-y-38M"... train a model of "Celeb A64-72M-z512-38M" on Celeb A [20]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, and test sets. |
| Hardware Specification | Yes | For training time, we train both models with the same network architectures (hyperparameters) on 128 128 image dataset using 4 Nvidia A100-SXM4 GPUs for distributed training and set batch size to 128 (32 for each GPU) to calculate their training throughput (imgs/sec./A100). |
| Software Dependencies | No | The paper mentions various models and frameworks (e.g., DPMs, DDPMs, DDIMs, VAEs, GANs, U-Net, Group Normalization), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | For training time, we train both models with the same network architectures (hyperparameters) on 128 128 image dataset using 4 Nvidia A100-SXM4 GPUs for distributed training and set batch size to 128 (32 for each GPU)... Empirically we set γ = 0.1... For encoder Eφ, unlike Diff-AE that uses the encoder part of U-Net [40], we find that simply stacked convolution layers and a linear layer is enough to learn meaningful z from x0. For gradient estimator Gψ, we use U-Net similar to the function approximator ϵθ of pre-trained DPM. |