LeRet: Language-Empowered Retentive Network for Time Series Forecasting
Authors: Qihe Huang, Zhengyang Zhou, Kuo Yang, Gengyu Lin, Zhongchao Yi, Yang Wang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations demonstrate the effectiveness of our Le Ret, especially reveal superiority on fewshot, and zero-shot forecasting tasks. Code is available at https://github.com/hqh0728/Le Ret. 4 Experiments 4.1 Datasets and Experimental Setups 4.2 Main Results 4.3 Ablation Study |
| Researcher Affiliation | Academia | 1 University of Science and Technology of China (USTC), Hefei, China 2 Suzhou Institute for Advanced Research, USTC, Suzhou, China 3 State Key Laboratory of Resources and Environmental Information System |
| Pseudocode | No | The paper describes the method using flow diagrams (Figure 2, Figure 3) and descriptive text, but no formal pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code is available at https://github.com/hqh0728/Le Ret. |
| Open Datasets | Yes | We evaluate performance of long-term forecasting on Weather, Traffic, Solar, Electricity and four ETT datasets (i.e., ETTh1, ETTh2, ETTm1, and ETTm2), which have been extensively adopted for benchmarking long-term forecasting models. For short-term forecasting, we adopt the Pe MS which contains four public traffic network datasets (PEMS03, PEMS04, PEMS07, PEMS08). |
| Dataset Splits | No | The input time series length L is set as 336 for all baselines, and we use four different prediction horizons T {96, 192, 336, 720}. For short-term forecasting, we adopt the Pe MS which contains four public traffic network datasets (PEMS03, PEMS04, PEMS07, PEMS08). All the models are following the same experimental setup with input length L = 96 and prediction length T = 12. In few-shot learning, only 10% of the training data timesteps are utilized. The paper describes input and prediction lengths and partial training data usage for few-shot learning, but does not explicitly state the standard train/validation/test splits (e.g., percentages or counts) for the main experiments. |
| Hardware Specification | No | No specific hardware details (e.g., GPU model, CPU model, memory) used for experiments are mentioned in the paper. |
| Software Dependencies | No | Since we choose LLa Ma as the LLM, which is a decoderonly architecture, under this causal encoding, each token can only perceive itself and the tokens before it. The paper mentions LLa Ma as the LLM but does not specify a version number. No other software dependencies with version numbers are provided. |
| Experiment Setup | Yes | The input time series length L is set as 336 for all baselines, and we use four different prediction horizons T {96, 192, 336, 720}. For short-term forecasting, we adopt the Pe MS which contains four public traffic network datasets (PEMS03, PEMS04, PEMS07, PEMS08). All the models are following the same experimental setup with input length L = 96 and prediction length T = 12. We partition it into non-overlapping patches of length P, resulting in a total of N = L P + 1 input patches Xpatch RN P . These patches are embedded as Xpe RN dp using a simple linear layer: Xpe = Linear(Reshape(Xinput)). The model employs h = dm/d retention heads in each layer, where d is the head dimension. Multi-scale retention (MSR) assigns different γ for each head, adding a swish gate to increase non-linearity. |