Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting
Authors: Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Yuxin Ma, Xuan Song
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A series of quantitative and qualitative evaluations on four widely used benchmarks (PEMS03, PEMS04, PEMS07, and PEMS08) are conducted to validate the state-of-the-art performance of STD-MAE. |
| Researcher Affiliation | Academia | Haotian Gao1,2 , Renhe Jiang1 , Zheng Dong2 , Jinliang Deng3 , Yuxin Ma2 , Xuan Song2 1The University of Tokyo 2Southern University of Science and Technology 3University of Technology Sydney |
| Pseudocode | No | The paper does not include a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Codes are available at https://github.com/ Jimmy-7664/STD-MAE. |
| Open Datasets | Yes | To thoroughly evaluate the proposed STD-MAE model, we conduct extensive experiments on four real-world spatiotemporal benchmark datasets as listed in Table 1: PEMS03, PEMS04, PEMS07, and PEMS08 [Song et al., 2020]. |
| Dataset Splits | Yes | Following previous work [Song et al., 2020; Li and Zhu, 2021; Fang et al., 2021; Jiang et al., 2023a; Guo et al., 2021b], we divide the four datasets into training, validation, and test sets according to a 6:2:2 ratio. |
| Hardware Specification | Yes | Experiments are mainly conducted on a Linux server with four NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions that experiments are performed on the 'Basic TS [Shao et al., 2023] platform' but does not specify version numbers for other general software dependencies like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | The embedding dimension D is 96. The encoder has 4 transformer layers while the decoder has 1 transformer layer. The number of multi-attention heads in transformer layer is set to 4. We use a patch size L of 12 to align with the forecasting input. T is equal to 1, which means we truncate and keep the last one patch of H(S) and H(T ). The masking ratio r is set to 0.25. Optimization is performed with Adam optimizer using an initial learning rate of 0.001 and mean absolute error (MAE) loss. |