GAFormer: Enhancing Timeseries Transformers Through Group-Aware Embeddings
Authors: Jingyun Xiao, Ran Liu, Eva L Dyer
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through evaluations on diverse time series datasets, we demonstrate that GE alone can significantly enhance the performance of several backbone models, and that the combination of spatial and temporal group embeddings allows GAFormer to surpass existing baselines. |
| Researcher Affiliation | Academia | Jingyun Xiao ,1, Ran Liu1, Eva L. Dyer1,2 1 Machine Learning Center 2 Department of Biomedical Engineering Georgia Institute of Technology, Atlanta, GA, 30332 Contact information: jxiao76@gatech.edu; evadyer@gatech.edu. |
| Pseudocode | No | The paper describes its methodology using textual descriptions and mathematical equations, but it does not include any explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Code is available at https://github.com/nerdslab/GAFormer. |
| Open Datasets | Yes | Both datasets are selected from the UEA Time Series Classification benchmark (Bagnall et al., 2018), where univariate time-series datasets contain Inline Skate (7 classes) (Morchen, 2006), Earthquakes (2 classes), Adiac (37 classes) (Jalba et al., 2004); While the multivariate time-series datasets contain Motor Imagery (64-channel ECoG, 2 classes) (Lal et al., 2004), Self Reg SCP2 (7-channel EEG, 2 classes) (Birbaumer et al., 2001), Face Detect (144-channel MEG, 2 classes), and Ethanol (3-channel Spectrometer, 4 classes) (Large et al., 2018). |
| Dataset Splits | Yes | For each dataset, we perform an 80/20% train/val split on the original training dataset, and select the best model on the validation set to obtain results on the testing set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers and schedulers (e.g., Adam optimizer, cosine annealing scheduler) but does not list specific software libraries or programming languages with their version numbers. |
| Experiment Setup | Yes | We train all models with learning rate of 0.0001 using the Adam optimizer (Kingma & Ba, 2014), with a batch size of 64 for 200k steps until the model converges. Each architecture contains 4-layer transformer blocks with 4 attention heads and 32 dimensions. |