DiffEnc: Variational Diffusion with a Learned Encoder

Authors: Beatrix Miranda Ginn Nielsen, Anders Christensen, Andrea Dittadi, Ole Winther

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on MNIST, CIFAR-10 and Image Net32 with two different parameterizations of the encoder and find that, with a trainable encoder, Diff Enc improves total likelihood on CIFAR-10 and improves the latent loss on all datasets without damaging the diffusion loss.
Researcher Affiliation Academia Beatrix M. G. Nielsen, 1 Anders Christensen,1,2 Andrea Dittadi, 2,4 Ole Winther 1,3,5 1Technical University of Denmark, 2Helmholtz AI, Munich, 3University of Copenhagen, 4Max Planck Institute for Intelligent Systems, 5Copenhagen University Hospital
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It includes mathematical derivations and equations.
Open Source Code Yes Code can be found on Git Hub1. 1https://github.com/bemigini/DiffEnc
Open Datasets Yes MNIST: The MNIST dataset (Le Cun et al., 1998) as fetched by the tensorflow datasets package3. 60,000 images were used for training and 10,000 images for test. License: Unknown. CIFAR-10: The CIFAR-10 dataset as fetched from the tensorflow datasets package4. Originally collected by Krizhevsky et al. (2009). 50,000 images were used for training and 10,000 images for test. License: Unknown. Image Net 32 32: The official downsampled version of Image Net (Chrabaszcz et al., 2017) from the Image Net website: https://image-net.org/download-images. php.
Dataset Splits No For MNIST and CIFAR-10, the paper explicitly provides training and test set sizes (60,000 training and 10,000 test for MNIST; 50,000 training and 10,000 test for CIFAR-10). However, it does not mention a separate validation split for these datasets. For ImageNet32, it only states that it's the “official downsampled version” without specifying splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments. It only generally refers to
Software Dependencies No The paper mentions using “tensorflow datasets package” for MNIST and CIFAR-10. However, it does not specify version numbers for Python, TensorFlow, PyTorch, or any other critical software libraries. The reproducibility statement mentions “correct versioning” in the GitHub Readme, but the paper itself does not provide these details.
Experiment Setup Yes We used a linear log SNR noise schedule: λt = λmax (λmax λmin) t. For the large models (VDMv-32, Diff Enc-32-4 and Diff Enc-32-8), we fixed the endpoints, λmax and λmin, to the ones Kingma et al. (2021) found were optimal. For the small models (VDMv-8, Diff Enc-8-2 and Diff Enc8-nt), we also experimented with learning the SNR endpoints. We trained all our models with either 3 or 5 seeds depending on the computational cost of the experiments. For models on MNIST and CIFAR-10 we used a batch size of 128 and no gradient clipping. For models on Image Net32 we used a batch size of 256 and no gradient clipping.