Diffusion Models With Learned Adaptive Noise
Authors: Subham Sahoo, Aaron Gokaslan, Christopher M. De Sa, Volodymyr Kuleshov
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, MULAN sets a new state-of-the-art in density estimation on CIFAR-10 and Image Net and reduces the number of training steps by 50%. |
| Researcher Affiliation | Academia | Subham Sekhar Sahoo Cornell Tech, NYC, USA. ssahoo@cs.cornell.edu Aaron Gokaslan Cornell Tech, NYC, USA. akg87@cs.cornell.edu Chris De Sa Cornell University, Ithaca, USA. cdesa@cs.cornell.edu Volodymyr Kuleshov Cornell Tech, NYC, USA. kuleshov@cornell.edu |
| Pseudocode | No | The paper describes algorithms through mathematical formulas and text but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the code1, along with a blog post and video tutorial on the project page: https://s-sahoo.com/Mu LAN 1https://github.com/s-sahoo/Mu LAN |
| Open Datasets | Yes | This section reports experiments on the CIFAR-10 [25] and Image Net-32 [58] datasets. |
| Dataset Splits | No | The paper mentions '50,000 training images and 10,000 test images' for CIFAR-10 and '1,281,167 training samples and 50,000 test samples' for ImageNet-32, but does not explicitly specify a validation split or how it was derived. |
| Hardware Specification | Yes | For the Image Net experiments, we used a single GPU node with 8-A100s. For the cifar-10 experiments, the models were trained on 4 GPUs spanning several GPUs types like V100, A5000s, A6000s, and 3090s with float32 precision. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, `scipy.integrate.solve_ivp` for the RK45 ODE solver, and `JAX` for Jacobian-vector-product computation, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For all our experiments, we use the Adam [21] optimizer with learning rate 2 × 10−4, exponential decay rates of β1 = 0.9, β2 = 0.99 and decoupled weight decay [29] coefficient of 0.01. We also maintain an exponential moving average (EMA) of model parameters with an EMA rate of 0.9999 for evaluation. For other hyperparameters, we use fixed start and end times which satisfy γmin = 13.3, γmax = 5.0... |