TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Authors: Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, JUN ZHOU

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to evaluate the performance and efficiency of Time Mixer, covering long-term and short-term forecasting, including 18 real-world benchmarks and 15 baselines.
Researcher Affiliation Collaboration 1Ant Group, Hangzhou, China 2Tsinghua University, Beijing, China {weiming.wsy,lintao.mlt,peter.sxm,james.z,jun.zhoujun}@antgroup.com, {wuhx23,htg21,luohk19}@mails.tsinghua.edu.cn
Pseudocode No The paper describes the architecture and operations of Time Mixer's components (PDM, FMM) using mathematical formulations (e.g., Equation 1, 2, 3) and descriptive text. However, it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes The source code is provided in supplementary materials and public in Git Hub (https://github.com/kwuking/Time Mixer) for reproducibility.
Open Datasets Yes Benchmarks For long-term forecasting, we experiment on 8 well-established benchmarks: ETT datasets (including 4 subsets: ETTh1, ETTh2, ETTm1, ETTm2), Weather, Solar-Energy, Electricity, and Traffic following (Zhou et al., 2021; Wu et al., 2021; Liu et al., 2022a). For short-term forecasting, we adopt the Pe MS (Chen et al., 2001) which contains four public traffic network datasets (PEMS03, PEMS04, PEMS07, PEMS08), and M4 dataset which involves 100,000 different time series collected in different frequencies.
Dataset Splits Yes Table 6: Dataset detailed descriptions. The dataset size is organized in (Train, Validation, Test).
Hardware Specification Yes All the experiments are implemented in Py Torch (Paszke et al., 2019) and conducted on a single NVIDIA A100 80GB GPU.
Software Dependencies No The paper states that experiments are "implemented in Py Torch (Paszke et al., 2019)" and that the "ADAM optimizer (Kingma & Ba, 2015)" was used. However, it does not provide specific version numbers for PyTorch or any other relevant libraries or packages.
Experiment Setup Yes We set the initial learning rate as 10 2 or 10 3 and used the ADAM optimizer (Kingma & Ba, 2015) with L2 loss for model optimization. And the batch size was set to be 8 between 128. By default, Time Mixer contains 2 Past Decomposable Mixing blocks. We choose the number of scales M according to the length of the time series to achieve a balance between performance and efficiency. To handle longer series in long-term forecasting, we set M to 3. As for short-term forecasting with limited series length, we set M to 1. Detailed model configuration information is presented in Table 7.