Time Series Diffusion in the Frequency Domain

Authors: Jonathan Crabbé, Nicolas Huynh, Jan Pawel Stanczuk, Mihaela Van Der Schaar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation on real-world datasets, covering various domains like healthcare and finance, shows that frequency diffusion models better capture the training distribution than time diffusion models.
Researcher Affiliation Academia 1DAMTP, University of Cambridge. Correspondence to: Jonathan Crabb e <jc2133@cam.ac.uk>, Nicolas Huynh <nvth2@cam.ac.uk>.
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The code is publicly available at the following links: https://github.com/Jonathan Crabbe/ Fourier Diffusion https://github.com/ vanderschaarlab/Fourier Diffusion
Open Datasets Yes Data. To illustrate the breadth of time series applications, we work with 6 different datasets described in Table 1. ... ECG (Kachuee et al., 2018) ... MIMIC-III (Johnson et al., 2016) ... NASDAQ-2019 (Onyshchak, 2020) ... NASA-Charge (Saha & Goebel, 2007) ... NASA-Discharge ... US-Droughts (Minixhofer, 2021)
Dataset Splits Yes We also split the datasets into a training set Dtrain and a validation set Dval. ... We train a forecasting model by using an LSTM backbone... and we use early stopping based on a validation set, with a train/validation set ratio of 0.8.
Hardware Specification Yes All the models were trained and used for sampling on a single machine equipped with a 18-Core Intel Core i9-10980XE CPU, a NVIDIA RTX A4000 GPU and a NVIDIA Ge Force RTX 3080.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'VP-SDE' but does not specify their version numbers.
Experiment Setup Yes For each dataset, we parametrize the time score model sθ and the frequency score model s θ as transformer encoders with 10 attention and MLP layers, each with 12 heads and dimension dmodel = 72. Both models have learnable positional encoding as well as diffusion time t encoding through random Fourier features composed with a learnable dense layer. This results in models with 3.2M parameters. We use a VP-SDE with linear noise scheduling and βmin = 0.1 and βmax = 20, as in (Song et al., 2020). The score models are trained with the denoising score-matching loss, as defined in Section 3. All the models are trained for 200 epochs with batch size 64, Adam W optimizer and cosine learning rate scheduling (20 warmup epochs, lrmax = 10 3). The selected model is the one achieving the lowest validation loss.