Domain Adaptation for Time Series Forecasting via Attention Sharing

Authors: Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, Yuyang Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various domains demonstrate that our proposed method outperforms state-of-the-art baselines on synthetic and real-world datasets, and ablation studies verify the effectiveness of our design choices.
Researcher Affiliation Collaboration 1Department of Computer Science, University of California Santa Barbara, California, USA (work done during internship at Amazon AWS AI) 2Amazon AWS AI 3Rutgers University.
Pseudocode Yes Algorithm 1 Adversarial Training of DAF
Open Source Code No The paper mentions using 'publicly available version on Sagemaker' for DAR and implementing models using PyTorch, but does not provide concrete access (link or explicit statement) to the source code for the DAF methodology itself or their specific implementations.
Open Datasets Yes We perform experiments on four real benchmark datasets that are widely used in forecasting literature: elec and traf from the UCI data repository (Dua & Graff, 2017), sales (Kar, 2019) and wiki (Lai, 2017) from Kaggle.
Dataset Splits Yes We partition the target datasets equally into training/validation/test splits, i.e. 10/10/10 days for hourly datasets and 20/20/20 days for daily datasets.
Hardware Specification No The paper states that models were trained 'on AWS Sagemaker' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper states, 'We implement the models using Py Torch (Paszke et al., 2019),' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes The following hyperparameters of DAF and baseline models are selected by grid-search over the validation set: the hidden dimension h {32, 64, 128, 256} of all models; the number of MLP layers l MLP {4} for N-BEATS 2, l MLP {1, 2, 3} for Att F, DAF and its variants; the number of RNN layers l RNN {1, 3} in DAR and RDA; the kernel sizes of convolutions s {3, 13, (3, 5), (3, 17)} in Att F, DAF and its variants; 3 the learning rate γ {0.001, 0.01, 0.1} for all models; the trade-off coefficient λ {0.1, 1, 10} in equation (2) for DAF, RDA-ADDA;