Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs

Authors: Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results empirically achieve the state-of-the-art likelihood on image datasets (2.56 on CIFAR-10, 3.43/3.69 on Image Net-32) without variational dequantization or data augmentation. We conduct ablation studies to demonstrate the effectiveness of separate parts.
Researcher Affiliation Collaboration 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University 2Pazhou Lab (Huangpu), Guangzhou, China.
Pseudocode Yes Algorithm 1 Adaptive importance sampling (single iteration)
Open Source Code No The paper states 'We implement our methods based on the open-source codebase of Kingma et al. (2021) implemented with JAX Bradbury et al. (2018)', but does not explicitly provide a link or statement that their own code for this paper is open-source or publicly available.
Open Datasets Yes We train our i-DODE on CIFAR-10 (Krizhevsky et al., 2009) and Image Net-321 (Deng et al., 2009), which are two popular benchmarks for generative modeling and density estimation.
Dataset Splits No The paper mentions training and evaluating on 'test set' and 'training set', such as 'We compute the loss on the test set by the SDE likelihood bound in Kingma et al. (2021)', but does not explicitly provide specific details about the dataset splits (e.g., percentages, sample counts, or a citation to a predefined split used by them) for training, validation, and testing.
Hardware Specification Yes All our training processes are conducted on 8 GPU cards of NVIDIA A40 expect for Image Net-32 (old version). For Image Net-32 (old version), the training processes are conducted on 8 GPU cards of NVIDIA A100 (40GB).
Software Dependencies No The paper mentions using 'JAX Bradbury et al. (2018)' but does not provide specific version numbers for JAX or other key software components used in their experiments.
Experiment Setup Yes For all our experiments, we use the Adam (Kingma & Ba, 2014) optimizer with learning rate 2e-4, exponential decay rates of beta1 = 0.9, beta2 = 0.99 and decoupled weight decay (Loshchilov & Hutter, 2019) coefficient of 0.01. We also maintain an exponential moving average (EMA) of model parameters with an EMA rate of 0.9999 for evaluation. We use a batch size of 128 for both training stages and both datasets.