reproducibilityindex.ai

TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts

Authors: Hyunwook Lee, Sungahn Ko

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on three public traffic network datasets, METR-LA, PEMS-BAY, and EXPY-TKY, demonstrate that TESTAM outperforms 13 existing methods in terms of accuracy due to its better modeling of recurring and non-recurring traffic patterns.
Researcher Affiliation	Academia	Hyunwook Lee & Sungahn Ko Ulsan National Institute of Science and Technology {gusdnr0916, sako}@unist.ac.kr
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	You can find the official code from https://github.com/HyunWookL/TESTAM
Open Datasets	Yes	We use three benchmark datasets for the experiments: METR-LA, PEMS-BAY, and EXPY-TKY. METR-LA and PEMS-BAY contain four-month speed data recorded by 207 sensors on Los Angeles highways and 325 sensors on Bay Area, respectively (Li et al., 2018). EXPY-TKY consists of three-month speed data collected from 1843 links in Tokyo, Japan.
Dataset Splits	Yes	In the cases of METR-LA and PEMS-BAY, we use 70% of the data for training, 10% for validation, and 20% for evaluation. For the EXPY-TKY, we utilize the first two months for training and validation and the last month for testing, as in the Mega CRN paper (Jiang et al., 2023).
Hardware Specification	Yes	All experiments are conducted using an RTX 3090 GPU.
Software Dependencies	No	The paper mentions optimizers and schedulers (e.g., ‘Adam optimizer’, ‘cosine annealing warmup restart scheduler’) but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	After performing a greedy search for hyperparameters, we set the hidden size d = e = 32, the memory size m = 20, the number of layers l = 3, the number of heads K = 4, the hidden size for the feed-forward networks hff = 128, and the error quantile q = 0.7. We use the Adam optimizer with β1 = 0.9, β2 = 0.98, and ϵ = 10 9, as in Vaswani et al. (2017). We vary the learning rate during training using the cosine annealing warmup restart scheduler (Loshchilov & Hutter, 2017) according to the formula below: ( lrmin + (lrmax lrmin) Tcur Twarm For the first Twarm steps lrmin + 1 2(lrmax lrmin) 1 + cos( Tcur Tfreq π) otherwise , (7) where Tcur is the number of steps since the last restart. We use Twarm = Tfreq = 4000, lrmin = 10 7 for all datasets and set lrmax = 3 10 3 for METR-LA and PEMS-BAY and lrmax = 3 10 4 for EXPY-TKY.