Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Information Theoretic Learning for Diffusion Models with Warm Start

Authors: Yirong Shen, Lu GAN, Cong Ling

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, our models achieve competitive NLL on CIFAR-10 and state-of-the-art results on Image Net across multiple resolutions, all without data augmentation, and the framework extends naturally to discrete data.
Researcher Affiliation Academia Yirong Shen Lu Gan Cong Ling Imperial College London Brunel University of London Imperial College London EMAIL EMAIL EMAIL
Pseudocode Yes Algorithm 1: Training ... Algorithm 2: Likelihood Evaluation
Open Source Code No Not only we do show all equations and train on standard datasets, we will open source the code.
Open Datasets Yes We evaluate on CIFAR-10, anti-aliased Image Net-32 dataset, Image Net-64 and -128... Among the two known versions of Image Net32, we adopt the newer, anti-aliased version [49], which facilitates likelihood training and remains publicly available.
Dataset Splits Yes CIFAR-10 contains 50,000 training and 10,000 test images. The Image Net variant includes 1,281,149 training and 49,999 test images.
Hardware Specification Yes For the Image Net-64 and -128 experiments, we used a single GPU node with 8 A800s or 8 H20-NVLink. For the CIFAR-10 and Image Net-32 experiments, the models were trained and evaluated on 4 GPUs spanning several GPUs types like V100, L20s, A40s, and 3090s with float32 precision.
Software Dependencies No We follow the same default training settings as [40]. For all our experiments, we use the Adam optimizer with learning rate 2 × 10−4, exponential decay rates of β1 = 0.9, β2 = 0.99 and decoupled weight decay coefficient of 0.01. We also maintain an exponential moving average (EMA) of model parameters with an EMA rate of 0.9999 for evaluation... float32 precision. The text mentions an optimizer and precision, but no specific software versions for libraries like PyTorch or Python itself.
Experiment Setup Yes For all our experiments, we use the Adam optimizer with learning rate 2 × 10−4, exponential decay rates of β1 = 0.9, β2 = 0.99 and decoupled weight decay coefficient of 0.01. We also maintain an exponential moving average (EMA) of model parameters with an EMA rate of 0.9999 for evaluation... We pretrain the model for 0.3 million iterations using a batch size of 128...Then we finetune the model for 1K iterations using a batch size of 256 and accumulate the gradient for every 4 batches.