TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts

Authors: Hyunwook Lee, Sungahn Ko

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three public traffic network datasets, METR-LA, PEMS-BAY, and EXPY-TKY, demonstrate that TESTAM outperforms 13 existing methods in terms of accuracy due to its better modeling of recurring and non-recurring traffic patterns.
Researcher Affiliation Academia Hyunwook Lee & Sungahn Ko Ulsan National Institute of Science and Technology {gusdnr0916, sako}@unist.ac.kr
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes You can find the official code from https://github.com/HyunWookL/TESTAM
Open Datasets Yes We use three benchmark datasets for the experiments: METR-LA, PEMS-BAY, and EXPY-TKY. METR-LA and PEMS-BAY contain four-month speed data recorded by 207 sensors on Los Angeles highways and 325 sensors on Bay Area, respectively (Li et al., 2018). EXPY-TKY consists of three-month speed data collected from 1843 links in Tokyo, Japan.
Dataset Splits Yes In the cases of METR-LA and PEMS-BAY, we use 70% of the data for training, 10% for validation, and 20% for evaluation. For the EXPY-TKY, we utilize the first two months for training and validation and the last month for testing, as in the Mega CRN paper (Jiang et al., 2023).
Hardware Specification Yes All experiments are conducted using an RTX 3090 GPU.
Software Dependencies No The paper mentions optimizers and schedulers (e.g., ‘Adam optimizer’, ‘cosine annealing warmup restart scheduler’) but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes After performing a greedy search for hyperparameters, we set the hidden size d = e = 32, the memory size m = 20, the number of layers l = 3, the number of heads K = 4, the hidden size for the feed-forward networks hff = 128, and the error quantile q = 0.7. We use the Adam optimizer with β1 = 0.9, β2 = 0.98, and ϵ = 10 9, as in Vaswani et al. (2017). We vary the learning rate during training using the cosine annealing warmup restart scheduler (Loshchilov & Hutter, 2017) according to the formula below: ( lrmin + (lrmax lrmin) Tcur Twarm For the first Twarm steps lrmin + 1 2(lrmax lrmin) 1 + cos( Tcur Tfreq π) otherwise , (7) where Tcur is the number of steps since the last restart. We use Twarm = Tfreq = 4000, lrmin = 10 7 for all datasets and set lrmax = 3 10 3 for METR-LA and PEMS-BAY and lrmax = 3 10 4 for EXPY-TKY.