Diffusion Based Representation Learning

Authors: Sarthak Mittal, Korbinian Abstreiter, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed approach on downstream tasks using the learned representations directly as well as using it as a pre-training step for semi-supervised image classification, thereby improving state-of-the-art approaches for the latter. Figure 4 shows that DRL and VDRL outperforms autoencoder-styled baselines as well as the restricted contrastive learning baseline.
Researcher Affiliation Academia 1Mila 2Universit e de Montr eal 3ETH Z urich 4Helmholtz AI 5Technical University of Munich 6Max Planck Institute for Intelligent Systems. Correspondence to: Sarthak Mittal <sarthmit@gmail.com>, Arash Mehrjou <arash@distantvantagepoint.com>.
Pseudocode No The paper describes the methods and objectives verbally and through equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes Results of proposed DRL models trained on MNIST and CIFAR-10... We directly evaluate the representations learned by different algorithms on downstream classification tasks for CIFAR-10, CIFAR100, and Mini-Image Net datasets. Krizhevsky, A., Nair, V., and Hinton, G. Cifar-10 (canadian institute for advanced research). a. URL http://www. cs.toronto.edu/ kriz/cifar.html.
Dataset Splits Yes We measure the accuracy of an SVM provided by sklearn (Pedregosa et al., 2011) with default hyperparameters trained on the representation of 100 (resp. 1000) training samples and their class labels. We perform additional experiments where the encoder system is as before and kept frozen, but the MLP can only access a fraction of the training set for the downstream supervised classification task. We ablate over three different number of labels provided to the MLP: 1000, 5000 and 10000.
Hardware Specification Yes Note that all experiments were conducted on a single RTX8000 GPU, taking up to 30 hours of wall-clock time, which only amounts to 15% of the iterations proposed in (Song et al., 2021b).
Software Dependencies No The paper mentions 'an SVM provided by sklearn (Pedregosa et al., 2011)' but does not specify version numbers for scikit-learn or any other software dependencies.
Experiment Setup Yes For all experiments, we use the same function σ(t), t [0, 1] as in Song et al. (2021b), which is σ(t) = σmin (σmax/σmin)t, where σmin = 0.01 and σmax = 50. Further, we use a 2d latent space for all qualitative experiments (Section 3.3) and 128 dimensional latent space for the downstream tasks (Section 3.1) and semi-supervised image classification (Section 3.2). We also set λ(t) = σ2(t)... For both datasets, we use a regularization weight of 10^-5 when applying L1-regularization, and a weight of 10^-7 when using a probabilistic encoder regularized with KL-Divergence. In each experiment, the model is trained for 80k iterations.