reproducibilityindex.ai

GAFormer: Enhancing Timeseries Transformers Through Group-Aware Embeddings

Authors: Jingyun Xiao, Ran Liu, Eva L Dyer

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through evaluations on diverse time series datasets, we demonstrate that GE alone can significantly enhance the performance of several backbone models, and that the combination of spatial and temporal group embeddings allows GAFormer to surpass existing baselines.
Researcher Affiliation	Academia	Jingyun Xiao ,1, Ran Liu1, Eva L. Dyer1,2 1 Machine Learning Center 2 Department of Biomedical Engineering Georgia Institute of Technology, Atlanta, GA, 30332 Contact information: jxiao76@gatech.edu; evadyer@gatech.edu.
Pseudocode	No	The paper describes its methodology using textual descriptions and mathematical equations, but it does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	Code is available at https://github.com/nerdslab/GAFormer.
Open Datasets	Yes	Both datasets are selected from the UEA Time Series Classification benchmark (Bagnall et al., 2018), where univariate time-series datasets contain Inline Skate (7 classes) (Morchen, 2006), Earthquakes (2 classes), Adiac (37 classes) (Jalba et al., 2004); While the multivariate time-series datasets contain Motor Imagery (64-channel ECoG, 2 classes) (Lal et al., 2004), Self Reg SCP2 (7-channel EEG, 2 classes) (Birbaumer et al., 2001), Face Detect (144-channel MEG, 2 classes), and Ethanol (3-channel Spectrometer, 4 classes) (Large et al., 2018).
Dataset Splits	Yes	For each dataset, we perform an 80/20% train/val split on the original training dataset, and select the best model on the validation set to obtain results on the testing set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions optimizers and schedulers (e.g., Adam optimizer, cosine annealing scheduler) but does not list specific software libraries or programming languages with their version numbers.
Experiment Setup	Yes	We train all models with learning rate of 0.0001 using the Adam optimizer (Kingma & Ba, 2014), with a batch size of 64 for 200k steps until the model converges. Each architecture contains 4-layer transformer blocks with 4 attention heads and 32 dimensions.