GAFormer: Enhancing Timeseries Transformers Through Group-Aware Embeddings

Authors: Jingyun Xiao, Ran Liu, Eva L Dyer

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through evaluations on diverse time series datasets, we demonstrate that GE alone can significantly enhance the performance of several backbone models, and that the combination of spatial and temporal group embeddings allows GAFormer to surpass existing baselines.
Researcher Affiliation Academia Jingyun Xiao ,1, Ran Liu1, Eva L. Dyer1,2 1 Machine Learning Center 2 Department of Biomedical Engineering Georgia Institute of Technology, Atlanta, GA, 30332 Contact information: jxiao76@gatech.edu; evadyer@gatech.edu.
Pseudocode No The paper describes its methodology using textual descriptions and mathematical equations, but it does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Code is available at https://github.com/nerdslab/GAFormer.
Open Datasets Yes Both datasets are selected from the UEA Time Series Classification benchmark (Bagnall et al., 2018), where univariate time-series datasets contain Inline Skate (7 classes) (Morchen, 2006), Earthquakes (2 classes), Adiac (37 classes) (Jalba et al., 2004); While the multivariate time-series datasets contain Motor Imagery (64-channel ECoG, 2 classes) (Lal et al., 2004), Self Reg SCP2 (7-channel EEG, 2 classes) (Birbaumer et al., 2001), Face Detect (144-channel MEG, 2 classes), and Ethanol (3-channel Spectrometer, 4 classes) (Large et al., 2018).
Dataset Splits Yes For each dataset, we perform an 80/20% train/val split on the original training dataset, and select the best model on the validation set to obtain results on the testing set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions optimizers and schedulers (e.g., Adam optimizer, cosine annealing scheduler) but does not list specific software libraries or programming languages with their version numbers.
Experiment Setup Yes We train all models with learning rate of 0.0001 using the Adam optimizer (Kingma & Ba, 2014), with a batch size of 64 for 200k steps until the model converges. Each architecture contains 4-layer transformer blocks with 4 attention heads and 32 dimensions.