FIDE: Frequency-Inflated Conditional Diffusion Model for Extreme-Aware Time Series Generation

Authors: Asadullah Hill Galib, Pang-Ning Tan, Lifeng Luo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on real-world and synthetic data showcase the efficacy of FIDE over baseline methods, highlighting its potential in advancing Generative AI for time series analysis, specifically in accurately modeling extreme events.
Researcher Affiliation Academia Asadullah Hill Galib, Pang-Ning Tan, and Lifeng Luo Michigan State University Emails: {galibasa, ptan, lluo}@msu.edu
Pseudocode Yes Algorithm 1 Training ... Algorithm 2 Sampling
Open Source Code Yes All the code and datasets used in this paper are available at https://github.com/galib19/FIDE.
Open Datasets Yes We partitioned each dataset into training, validation, and testing, according to a 8:1:1 ratio. ... The datasets used are described in Appendix D. ... (1) Synthetic Data (AR2): AR(2) dataset comprises synthetic time series data generated using an autoregressive model of order 2. (2) Financial Data (Stocks): It features continuous-valued and aperiodic sequences, such as daily historical Google stocks data spanning from 2004 to 2019. We consider the adjusted closing price data for this work. (3) Energy Data (Appliance Energy): The UCI Appliances energy prediction dataset [3] encompasses multivariate, continuous-valued measurements. We consider appliance energy data for analysis. (4) Weather/Climate Data (Daily Minimum Temperature): This dataset [17] comprises daily minimum temperatures in Melbourne, Australia, from 1981 to 1990. (5) Medical Data (ECG5000: Congestive Heart Failure): The original dataset [9] for "ECG5000" originates from a 20-hour long electrocardiogram (ECG) obtained from the Physionet database. Specifically, it is derived from the BIDMC Congestive Heart Failure Database (chfdb), with the record labeled as "chf07." The processed data encompasses 5,000 heartbeats randomly selected from the original dataset.
Dataset Splits Yes We partitioned each dataset into training, validation, and testing, according to a 8:1:1 ratio.
Hardware Specification Yes All experiments were conducted on NVIDIA T4 GPU.
Software Dependencies No The paper mentions using the Adam optimizer and Ray Tune framework with ASHA scheduler, but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, etc.).
Experiment Setup Yes The encoder component of our framework employs a 3-layer transformer architecture, accompanied by fully connected layers. The training was facilitated using the Adam optimizer. For all the methods, we perform extensive hyperparameter tuning on the length of the embedding vector, the number of hidden layers, the number of nodes, the learning rate, and the batch size. The optimal hyperparameters were determined using the Ray Tune framework, integrating an Asynchronous Successive Halving Algorithm (ASHA) scheduler to enable early stopping.