OCD: Learning to Overfit with Conditional Diffusion Models

Authors: Shahar Lutati, Lior Wolf

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate the wide applicability of the method for image classification, 3D reconstruction, tabular data, speech separation, and natural language processing. In all cases, the results obtained by our method improve upon the baseline model to which our method is applied.
Researcher Affiliation Academia 1Blavatnik School of Computer Science, Tel Aviv University. Correspondence to: Shahar Lutati <shahar761@gmail.com>, Lior Wolf <wolf@cs.tau.ac.il>.
Pseudocode Yes Algorithm 1 Training Algorithm. Input: S training set, θ base network parameters, L the loss of the primary task, T diffusion steps Output: ϵΩdiffusion network (incl. Ei,Ea, Eo).
Open Source Code Yes Our code is attached as supplementary material.
Open Datasets Yes MNIST dataset (Le Cun & Cortes, 2010) are obtained with the Le Net5 architecture (Lecun et al., 1998). CIFAR10 images (Krizhevsky et al., 2009) are classified using Google Net (Szegedy et al., 2015). Experiments were also conducted on the Tiny Image Net dataset (Le & Yang, 2015). We run on two of the benchmarks listed: California Housing Kelley Pace & Barry (1997) (CA), and Microsoft LETOR4.0(MI) (Qin & Liu, 2013). The same backbone and Hungarian-method loss are used in our experiments, which run on the Libri5Mix dataset without augmentations, measuring the SI-SDRi score.
Dataset Splits No The paper does not explicitly state specific training/test/validation dataset splits with proportions or sample counts. It refers to training and test sets but not a distinct validation split.
Hardware Specification Yes Tab. 7 lists the measured runtime with and without OCD for both training and inference, on the low-end Nvidia RTX2060 GPU on which all experiments were run.
Software Dependencies No The paper mentions using the Adam optimizer and a U-Net architecture, but does not provide specific software names with version numbers for reproducibility (e.g., PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes In all experiments, the UNet ϵΩhas 128 channels and five downsampling layers. The positional encoding, PE has dimensions of 128x128. The Adam optimizer (Kingma & Ba, 2014) is used, with a learning rate of 10 4. A linear noise schedule is used based on Song et al. (2020), and the number of diffusion steps is 10. All experiments are repeated three times to report the standard deviation (SD) of the success metrics.