Diffusion Models With Learned Adaptive Noise

Authors: Subham Sahoo, Aaron Gokaslan, Christopher M. De Sa, Volodymyr Kuleshov

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, MULAN sets a new state-of-the-art in density estimation on CIFAR-10 and Image Net and reduces the number of training steps by 50%.
Researcher Affiliation Academia Subham Sekhar Sahoo Cornell Tech, NYC, USA. ssahoo@cs.cornell.edu Aaron Gokaslan Cornell Tech, NYC, USA. akg87@cs.cornell.edu Chris De Sa Cornell University, Ithaca, USA. cdesa@cs.cornell.edu Volodymyr Kuleshov Cornell Tech, NYC, USA. kuleshov@cornell.edu
Pseudocode No The paper describes algorithms through mathematical formulas and text but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes We provide the code1, along with a blog post and video tutorial on the project page: https://s-sahoo.com/Mu LAN 1https://github.com/s-sahoo/Mu LAN
Open Datasets Yes This section reports experiments on the CIFAR-10 [25] and Image Net-32 [58] datasets.
Dataset Splits No The paper mentions '50,000 training images and 10,000 test images' for CIFAR-10 and '1,281,167 training samples and 50,000 test samples' for ImageNet-32, but does not explicitly specify a validation split or how it was derived.
Hardware Specification Yes For the Image Net experiments, we used a single GPU node with 8-A100s. For the cifar-10 experiments, the models were trained on 4 GPUs spanning several GPUs types like V100, A5000s, A6000s, and 3090s with float32 precision.
Software Dependencies No The paper mentions using the Adam optimizer, `scipy.integrate.solve_ivp` for the RK45 ODE solver, and `JAX` for Jacobian-vector-product computation, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For all our experiments, we use the Adam [21] optimizer with learning rate 2 × 10−4, exponential decay rates of β1 = 0.9, β2 = 0.99 and decoupled weight decay [29] coefficient of 0.01. We also maintain an exponential moving average (EMA) of model parameters with an EMA rate of 0.9999 for evaluation. For other hyperparameters, we use fixed start and end times which satisfy γmin = 13.3, γmax = 5.0...