Efficient Denoising Diffusion via Probabilistic Masking

Authors: Weizhong Zhang, Zhiwei Zhang, Renjie Pi, Zhongming Jin, Yuan Gao, Jieping Ye, Kani Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate the superiority of our proposed EDDPM over the state-of-the-art sampling acceleration methods across various domains. EDDPM can generate high-quality samples with only 20% of the steps for time series imputation and achieve 4.89 FID with 5 steps for CIFAR-10.
Researcher Affiliation Collaboration 1Fudan University 2Hong Kong University of Science and Technology 3Alibaba Group 4Wuhan University.
Pseudocode Yes Algorithm 1 Efficient Denoising Diffusion via Probabilistic Masking (EDDPM)
Open Source Code No The paper does not provide any explicit statements about releasing source code or a link to a code repository for the methodology.
Open Datasets Yes Datasets. We use the CIFAR-10 dataset (Krizhevsky et al., 2009) (50k images of resolution 32 32) for image synthesis, and Healthcare (Silva et al., 2012) and Air-quality (Tashiro et al., 2021) for the time series imputation experiments.
Dataset Splits No The paper mentions generating test data for imputation (e.g., 'randomly choose 10/50/90% of observed values as ground-truth on the test data for imputation') but does not specify the overall training, validation, and test dataset splits for reproducibility.
Hardware Specification Yes All the experiments are implemented by Pytorch 1.7.0 on a virtual workstation with 8 11G memory Nvidia Ge Force RTX 2080Ti GPUs.
Software Dependencies No The paper only specifies 'Pytorch 1.7.0' as a software dependency with a version number. It does not list multiple key components or a self-contained solver with versions.
Experiment Setup Yes As for model hyper-parameters, we set the batch size as 16 and the number of epochs as 200. We used Adam (Kingma & Ba, 2014) optimizer with learning rate 0.001 that is decayed to 0.0001 and 0.00001 at 75% and 90% of the total epochs, respectively. For the diffusion model, we follow the CSDI (Tashiro et al., 2021) architecture to set the number of residual layers as 4, residual channels as 64, and attention heads as 8. The denoising step T is set to 50 as our baseline. Following (Nichol & Dhariwal, 2021), we use the U-Net model architecture, train 500K iterations with a batch size of 128, use a learning rate of 0.0001 with the Adam (Kingma & Ba, 2014) optimizer and use an exponential moving average (EMA) with a rate of 0.9999. The denoising step T is set to 1000 and the linear forward noise schedule is used as our baseline.