Efficient Denoising Diffusion via Probabilistic Masking
Authors: Weizhong Zhang, Zhiwei Zhang, Renjie Pi, Zhongming Jin, Yuan Gao, Jieping Ye, Kani Chen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate the superiority of our proposed EDDPM over the state-of-the-art sampling acceleration methods across various domains. EDDPM can generate high-quality samples with only 20% of the steps for time series imputation and achieve 4.89 FID with 5 steps for CIFAR-10. |
| Researcher Affiliation | Collaboration | 1Fudan University 2Hong Kong University of Science and Technology 3Alibaba Group 4Wuhan University. |
| Pseudocode | Yes | Algorithm 1 Efficient Denoising Diffusion via Probabilistic Masking (EDDPM) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or a link to a code repository for the methodology. |
| Open Datasets | Yes | Datasets. We use the CIFAR-10 dataset (Krizhevsky et al., 2009) (50k images of resolution 32 32) for image synthesis, and Healthcare (Silva et al., 2012) and Air-quality (Tashiro et al., 2021) for the time series imputation experiments. |
| Dataset Splits | No | The paper mentions generating test data for imputation (e.g., 'randomly choose 10/50/90% of observed values as ground-truth on the test data for imputation') but does not specify the overall training, validation, and test dataset splits for reproducibility. |
| Hardware Specification | Yes | All the experiments are implemented by Pytorch 1.7.0 on a virtual workstation with 8 11G memory Nvidia Ge Force RTX 2080Ti GPUs. |
| Software Dependencies | No | The paper only specifies 'Pytorch 1.7.0' as a software dependency with a version number. It does not list multiple key components or a self-contained solver with versions. |
| Experiment Setup | Yes | As for model hyper-parameters, we set the batch size as 16 and the number of epochs as 200. We used Adam (Kingma & Ba, 2014) optimizer with learning rate 0.001 that is decayed to 0.0001 and 0.00001 at 75% and 90% of the total epochs, respectively. For the diffusion model, we follow the CSDI (Tashiro et al., 2021) architecture to set the number of residual layers as 4, residual channels as 64, and attention heads as 8. The denoising step T is set to 50 as our baseline. Following (Nichol & Dhariwal, 2021), we use the U-Net model architecture, train 500K iterations with a batch size of 128, use a learning rate of 0.0001 with the Adam (Kingma & Ba, 2014) optimizer and use an exponential moving average (EMA) with a rate of 0.9999. The denoising step T is set to 1000 and the linear forward noise schedule is used as our baseline. |