Probabilistic Time Series Modeling with Decomposable Denoising Diffusion Model

Authors: Tijin Yan, Hengheng Gong, He Yongping, Yufeng Zhan, Yuanqing Xia

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on 8 real-world datasets show that D3M reduces RMSE and CRPS by up to 4.6% and 4.3% compared with state-of-the-arts on imputation tasks, and achieves comparable results with state-of-the-arts on forecasting tasks with only 10 steps.
Researcher Affiliation Academia 1School of Automation, Beijing Institute of Technology, Beijing, China. 2Zhongyuan University of Technology, Zhengzhou, Henan, China.
Pseudocode Yes Algorithm 1 Unconditional training procedure of D3M.
Open Source Code No The paper does not provide an explicit statement or a link indicating the release of open-source code for the methodology described.
Open Datasets Yes For time series imputation tasks, we use the Physio Net Challenge 2012 and Air quality for evaluation. Detailed description of these datasets can be found in Appendix B.1. [...] We use six real-world datasets for evaluation: Exchange, Solar, Electricity, Traffic, Taxi and Wikipedia. All of these datasets can be obtained from Gluon TS (Alexandrov et al., 2020).
Dataset Splits No The paper mentions masking proportions for testing but does not explicitly provide train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Gluon TS' but does not provide specific version numbers for it or any other software dependencies needed to replicate the experiments.
Experiment Setup Yes The number of steps for inference is set as 10 for all experiments. We set µ = 0, Σ = I for simplicity. For D3M with Linear type for h(t), we set a = X0, b = X0 / 2. We use the grid-search method for the hyper-parameters of EMA and linear gated attention module. Specifically, we set candidates of h as {8, 12, 16}, candidates of z as {32, 64, 96}, candidates of v as {128, 160, 256}. The batch size and epochs are set as 16 and 300, separately. In addition, we use a multi-step learning rate scheduler which decays the learning rate at 75% and 90% of all epochs.