Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs
Authors: Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results empirically achieve the state-of-the-art likelihood on image datasets (2.56 on CIFAR-10, 3.43/3.69 on Image Net-32) without variational dequantization or data augmentation. We conduct ablation studies to demonstrate the effectiveness of separate parts. |
| Researcher Affiliation | Collaboration | 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University 2Pazhou Lab (Huangpu), Guangzhou, China. |
| Pseudocode | Yes | Algorithm 1 Adaptive importance sampling (single iteration) |
| Open Source Code | No | The paper states 'We implement our methods based on the open-source codebase of Kingma et al. (2021) implemented with JAX Bradbury et al. (2018)', but does not explicitly provide a link or statement that their own code for this paper is open-source or publicly available. |
| Open Datasets | Yes | We train our i-DODE on CIFAR-10 (Krizhevsky et al., 2009) and Image Net-321 (Deng et al., 2009), which are two popular benchmarks for generative modeling and density estimation. |
| Dataset Splits | No | The paper mentions training and evaluating on 'test set' and 'training set', such as 'We compute the loss on the test set by the SDE likelihood bound in Kingma et al. (2021)', but does not explicitly provide specific details about the dataset splits (e.g., percentages, sample counts, or a citation to a predefined split used by them) for training, validation, and testing. |
| Hardware Specification | Yes | All our training processes are conducted on 8 GPU cards of NVIDIA A40 expect for Image Net-32 (old version). For Image Net-32 (old version), the training processes are conducted on 8 GPU cards of NVIDIA A100 (40GB). |
| Software Dependencies | No | The paper mentions using 'JAX Bradbury et al. (2018)' but does not provide specific version numbers for JAX or other key software components used in their experiments. |
| Experiment Setup | Yes | For all our experiments, we use the Adam (Kingma & Ba, 2014) optimizer with learning rate 2e-4, exponential decay rates of beta1 = 0.9, beta2 = 0.99 and decoupled weight decay (Loshchilov & Hutter, 2019) coefficient of 0.01. We also maintain an exponential moving average (EMA) of model parameters with an EMA rate of 0.9999 for evaluation. We use a batch size of 128 for both training stages and both datasets. |