Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Authors: Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Yuxin Ma, Xuan Song

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A series of quantitative and qualitative evaluations on four widely used benchmarks (PEMS03, PEMS04, PEMS07, and PEMS08) are conducted to validate the state-of-the-art performance of STD-MAE.
Researcher Affiliation Academia Haotian Gao1,2 , Renhe Jiang1 , Zheng Dong2 , Jinliang Deng3 , Yuxin Ma2 , Xuan Song2 1The University of Tokyo 2Southern University of Science and Technology 3University of Technology Sydney
Pseudocode No The paper does not include a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Codes are available at https://github.com/ Jimmy-7664/STD-MAE.
Open Datasets Yes To thoroughly evaluate the proposed STD-MAE model, we conduct extensive experiments on four real-world spatiotemporal benchmark datasets as listed in Table 1: PEMS03, PEMS04, PEMS07, and PEMS08 [Song et al., 2020].
Dataset Splits Yes Following previous work [Song et al., 2020; Li and Zhu, 2021; Fang et al., 2021; Jiang et al., 2023a; Guo et al., 2021b], we divide the four datasets into training, validation, and test sets according to a 6:2:2 ratio.
Hardware Specification Yes Experiments are mainly conducted on a Linux server with four NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies No The paper mentions that experiments are performed on the 'Basic TS [Shao et al., 2023] platform' but does not specify version numbers for other general software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes The embedding dimension D is 96. The encoder has 4 transformer layers while the decoder has 1 transformer layer. The number of multi-attention heads in transformer layer is set to 4. We use a patch size L of 12 to align with the forecasting input. T is equal to 1, which means we truncate and keep the last one patch of H(S) and H(T ). The masking ratio r is set to 0.25. Optimization is performed with Adam optimizer using an initial learning rate of 0.001 and mean absolute error (MAE) loss.