Classification Diffusion Models: Revitalizing Density Ratio Estimation

Authors: Shahar Yadin, Noam Elata, Tomer Michaeli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method is the first DRE-based technique that can successfully generate images beyond the MNIST dataset. Furthermore, it can output the likelihood of any input in a single forward pass, achieving state-of-the-art negative log likelihood (NLL) among methods with this property... Our experiments shed light on the reasons why DRE methods have failed on complex high-dimensional data to date, and why CDM inherently avoids these challenges.
Researcher Affiliation Academia Shahar Yadin Noam Elata Tomer Michaeli Faculty of Electrical and Computer Engineering Technion Israel Institute of Technology {shahar.yadin@campus,noamelata@campus,tomer.m@ee}.technion.ac.il
Pseudocode Yes Algorithm 1 CDM Training", "Algorithm 2 DDPM Sampling Using CDM
Open Source Code Yes Code is available on the project s webpage.
Open Datasets Yes We train several CDMs on two common datasets. For CIFAR-10 [26] we train both a class conditional model and an unconditional model. We also train a similar model for Celeb A [31], using face images of size 64 64.
Dataset Splits No The paper mentions using common datasets like CIFAR-10 and Celeb A and evaluating on the test set, but it does not provide specific details on the train/validation/test splits (e.g., percentages, sample counts, or explicit splitting methodology) within the paper.
Hardware Specification Yes Training the model on Celeb A 64 64 takes 108 hours on a server of 4 NVIDIA RTX A6000 48GB GPUs... Training the model on CIFAR-10 takes 35 hours on a server of 4 NVIDIA RTX A6000 48GB GPUs.
Software Dependencies No The paper mentions using a 'pytorch diffusion repository' but does not specify the version of PyTorch or any other software dependencies with their version numbers.
Experiment Setup Yes We trained the model for 500k iterations with a learning rate of 1 10 4. We started with a linear warmup of 5k iterations and reduced the learning rate by a factor of 10 after every 200k iterations. The typical value of the CE loss after convergence was 3.8 while the MSE loss was 0.0134 so we chose to give the CE loss a weight of 0.001 to ensure the values of both losses have the same order of magnitude. In addition We used EMA with a factor of 0.9999, as done in the baseline model.