DiffEnc: Variational Diffusion with a Learned Encoder
Authors: Beatrix Miranda Ginn Nielsen, Anders Christensen, Andrea Dittadi, Ole Winther
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on MNIST, CIFAR-10 and Image Net32 with two different parameterizations of the encoder and find that, with a trainable encoder, Diff Enc improves total likelihood on CIFAR-10 and improves the latent loss on all datasets without damaging the diffusion loss. |
| Researcher Affiliation | Academia | Beatrix M. G. Nielsen, 1 Anders Christensen,1,2 Andrea Dittadi, 2,4 Ole Winther 1,3,5 1Technical University of Denmark, 2Helmholtz AI, Munich, 3University of Copenhagen, 4Max Planck Institute for Intelligent Systems, 5Copenhagen University Hospital |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It includes mathematical derivations and equations. |
| Open Source Code | Yes | Code can be found on Git Hub1. 1https://github.com/bemigini/DiffEnc |
| Open Datasets | Yes | MNIST: The MNIST dataset (Le Cun et al., 1998) as fetched by the tensorflow datasets package3. 60,000 images were used for training and 10,000 images for test. License: Unknown. CIFAR-10: The CIFAR-10 dataset as fetched from the tensorflow datasets package4. Originally collected by Krizhevsky et al. (2009). 50,000 images were used for training and 10,000 images for test. License: Unknown. Image Net 32 32: The official downsampled version of Image Net (Chrabaszcz et al., 2017) from the Image Net website: https://image-net.org/download-images. php. |
| Dataset Splits | No | For MNIST and CIFAR-10, the paper explicitly provides training and test set sizes (60,000 training and 10,000 test for MNIST; 50,000 training and 10,000 test for CIFAR-10). However, it does not mention a separate validation split for these datasets. For ImageNet32, it only states that it's the “official downsampled version” without specifying splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments. It only generally refers to |
| Software Dependencies | No | The paper mentions using “tensorflow datasets package” for MNIST and CIFAR-10. However, it does not specify version numbers for Python, TensorFlow, PyTorch, or any other critical software libraries. The reproducibility statement mentions “correct versioning” in the GitHub Readme, but the paper itself does not provide these details. |
| Experiment Setup | Yes | We used a linear log SNR noise schedule: λt = λmax (λmax λmin) t. For the large models (VDMv-32, Diff Enc-32-4 and Diff Enc-32-8), we fixed the endpoints, λmax and λmin, to the ones Kingma et al. (2021) found were optimal. For the small models (VDMv-8, Diff Enc-8-2 and Diff Enc8-nt), we also experimented with learning the SNR endpoints. We trained all our models with either 3 or 5 seeds depending on the computational cost of the experiments. For models on MNIST and CIFAR-10 we used a batch size of 128 and no gradient clipping. For models on Image Net32 we used a batch size of 256 and no gradient clipping. |