TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts
Authors: Hyunwook Lee, Sungahn Ko
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three public traffic network datasets, METR-LA, PEMS-BAY, and EXPY-TKY, demonstrate that TESTAM outperforms 13 existing methods in terms of accuracy due to its better modeling of recurring and non-recurring traffic patterns. |
| Researcher Affiliation | Academia | Hyunwook Lee & Sungahn Ko Ulsan National Institute of Science and Technology {gusdnr0916, sako}@unist.ac.kr |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | You can find the official code from https://github.com/HyunWookL/TESTAM |
| Open Datasets | Yes | We use three benchmark datasets for the experiments: METR-LA, PEMS-BAY, and EXPY-TKY. METR-LA and PEMS-BAY contain four-month speed data recorded by 207 sensors on Los Angeles highways and 325 sensors on Bay Area, respectively (Li et al., 2018). EXPY-TKY consists of three-month speed data collected from 1843 links in Tokyo, Japan. |
| Dataset Splits | Yes | In the cases of METR-LA and PEMS-BAY, we use 70% of the data for training, 10% for validation, and 20% for evaluation. For the EXPY-TKY, we utilize the first two months for training and validation and the last month for testing, as in the Mega CRN paper (Jiang et al., 2023). |
| Hardware Specification | Yes | All experiments are conducted using an RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions optimizers and schedulers (e.g., ‘Adam optimizer’, ‘cosine annealing warmup restart scheduler’) but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | After performing a greedy search for hyperparameters, we set the hidden size d = e = 32, the memory size m = 20, the number of layers l = 3, the number of heads K = 4, the hidden size for the feed-forward networks hff = 128, and the error quantile q = 0.7. We use the Adam optimizer with β1 = 0.9, β2 = 0.98, and ϵ = 10 9, as in Vaswani et al. (2017). We vary the learning rate during training using the cosine annealing warmup restart scheduler (Loshchilov & Hutter, 2017) according to the formula below: ( lrmin + (lrmax lrmin) Tcur Twarm For the first Twarm steps lrmin + 1 2(lrmax lrmin) 1 + cos( Tcur Tfreq π) otherwise , (7) where Tcur is the number of steps since the last restart. We use Twarm = Tfreq = 4000, lrmin = 10 7 for all datasets and set lrmax = 3 10 3 for METR-LA and PEMS-BAY and lrmax = 3 10 4 for EXPY-TKY. |