Classification Diffusion Models: Revitalizing Density Ratio Estimation
Authors: Shahar Yadin, Noam Elata, Tomer Michaeli
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is the first DRE-based technique that can successfully generate images beyond the MNIST dataset. Furthermore, it can output the likelihood of any input in a single forward pass, achieving state-of-the-art negative log likelihood (NLL) among methods with this property... Our experiments shed light on the reasons why DRE methods have failed on complex high-dimensional data to date, and why CDM inherently avoids these challenges. |
| Researcher Affiliation | Academia | Shahar Yadin Noam Elata Tomer Michaeli Faculty of Electrical and Computer Engineering Technion Israel Institute of Technology {shahar.yadin@campus,noamelata@campus,tomer.m@ee}.technion.ac.il |
| Pseudocode | Yes | Algorithm 1 CDM Training", "Algorithm 2 DDPM Sampling Using CDM |
| Open Source Code | Yes | Code is available on the project s webpage. |
| Open Datasets | Yes | We train several CDMs on two common datasets. For CIFAR-10 [26] we train both a class conditional model and an unconditional model. We also train a similar model for Celeb A [31], using face images of size 64 64. |
| Dataset Splits | No | The paper mentions using common datasets like CIFAR-10 and Celeb A and evaluating on the test set, but it does not provide specific details on the train/validation/test splits (e.g., percentages, sample counts, or explicit splitting methodology) within the paper. |
| Hardware Specification | Yes | Training the model on Celeb A 64 64 takes 108 hours on a server of 4 NVIDIA RTX A6000 48GB GPUs... Training the model on CIFAR-10 takes 35 hours on a server of 4 NVIDIA RTX A6000 48GB GPUs. |
| Software Dependencies | No | The paper mentions using a 'pytorch diffusion repository' but does not specify the version of PyTorch or any other software dependencies with their version numbers. |
| Experiment Setup | Yes | We trained the model for 500k iterations with a learning rate of 1 10 4. We started with a linear warmup of 5k iterations and reduced the learning rate by a factor of 10 after every 200k iterations. The typical value of the CE loss after convergence was 3.8 while the MSE loss was 0.0134 so we chose to give the CE loss a weight of 0.001 to ensure the values of both losses have the same order of magnitude. In addition We used EMA with a factor of 0.9999, as done in the baseline model. |