InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models

Authors: Yingheng Wang, Yair Schiff, Aaron Gokaslan, Weishen Pan, Fei Wang, Christopher De Sa, Volodymyr Kuleshov

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Info Diffusion on a suite of benchmark datasets and find that it learns latent representations that are competitive with state-of-the-art generative and contrastive methods, while retaining the high sample quality of diffusion models.
Researcher Affiliation Academia 1Department of Computer Science, Cornell University, Ithaca, NY, USA 2Department of Computer Science, Cornell Tech, New York City, NY, USA 3Department of Population Health Sciences, Weill Cornell Medicine, New York City, NY, USA.
Pseudocode No The paper includes figures illustrating network architectures but does not contain any formal pseudocode or algorithm blocks labeled as such.
Open Source Code No The paper does not provide an unambiguous statement about releasing the source code for their method or a direct link to a code repository.
Open Datasets Yes We measure performance on the following datasets: Fashion MNIST (Xiao et al., 2017), CIFAR10 (Krizhevsky et al., 2009), FFHQ (Karras et al., 2019), Celeb A (Liu et al., 2015), and 3DShapes (Burgess & Kim, 2018).
Dataset Splits Yes We split the data into 80% training and 20% test, fit the classifier on the training data, and evaluate on the test set. We repeat this 5-fold and report mean metrics one standard deviation.
Hardware Specification Yes Table 7. Hyperparameters for Info Diffusion and baseline training. ... GPU: TITANXP, RTX2080TI, TITANRTX, RTX4090
Software Dependencies No The paper lists 'pytorch (Paszke et al., 2019)' and 'scikit-learn (Pedregosa et al., 2011)' with citations, but does not provide specific version numbers (e.g., PyTorch 1.9) for reproducibility.
Experiment Setup Yes In Table 7, we detail the hyperparameters used in training our Info Diffusion and baseline models, across datasets. We also note that for all of these experiments we use the ADAM optimizer with learning rate 1e 4 and train for 50 epochs.