Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Scale Finetuning for Encoder-based Time Series Foundation Models
Authors: Zhongzheng Qiao, Chenghao Liu, Yiming Zhang, Ming Jin, Quang Pham, Qingsong Wen, Ponnuthurai Suganthan, Xudong Jiang, Savitha Ramasamy
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three different backbones (MOIRAI, MOMENT and UNITS) demonstrate that TSFMs finetuned with MSFT not only outperform naive and typical parameter efficient finetuning methods but also surpass state-of-the-art deep learning methods. |
| Researcher Affiliation | Collaboration | 1Nanyang Technological University. 2Institute for Infocomm Research, A*STAR. 3CNRS@CREATE. 4Salesforce AI Research. 5Griffith University. 6Squirrel Ai Learning. 7Qatar University. |
| Pseudocode | Yes | For clarity, we provide the Pytorch-like pseudo codes of MSFT in Algorithm 1 and Algorithm 2, , illustrating the overall training pipeline and the MSFT attention block described in Section 4. |
| Open Source Code | Yes | Codes are available at https://github.com/zqiao11/MSFT. |
| Open Datasets | Yes | For long sequence forecasting (LSF), we conduct experiments on six well-established datasets, including the ETT datasets (ETTh1, ETTh2, ETTm1, ETTm2) [51], Weather [45], and Electricity [45]. We note that these datasets are not included in the pretraining datasets of the TSFMs we evaluated. The key properties of these LSF datasets are detailed in Table 4. Following Moirai [43], we use 5 out-of-distribution datasets for probabilistic forecasting: Electricity [38], Solar-Power [16], Jena Weather, Istanbul Traffic2, and Turkey Power3. Detailed descriptions of these datasets are provided in Table 5. |
| Dataset Splits | Yes | We create the training, validation, and test datasets by cropping time series windows with fixed sequence lengths. Given the context and prediction lengths, samples are segmented using a sliding window, where the window size is C + H. The train-val-test split follows the default LSF setup. Data are normalized for LSF but not for PF. ... The test set comprises the final time steps, segmented into multiple non-overlapping evaluation windows. The length of the prediction window and the number of rolling evaluations are tailored for each dataset based on its frequency (see Table 5 for details). |
| Hardware Specification | Yes | Our experiments are conducted on a server equipped with an AMD EPYC 7763 CPU (64 cores, 128 threads) and four NVIDIA A40 GPUs, each with 40 GB of memory. ... The experiments in this section are exclusively conducted on another server equipped with a 12 v CPU Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz and a single RTX 3080 GPU with 20GB of memory. |
| Software Dependencies | No | We use the Adam W optimizer with weight decay=0.1, β1 = 0.9, and β2 = 0.98 for optimization. ... For Moirai and Moment, we directly adopt the PEFT library [22] for both Lo RA and Ada Lo RA. ... For Moirai, the evaluation is based on the Gluon TS Library [1]. |
| Experiment Setup | Yes | We use the Adam W optimizer with weight decay=0.1, β1 = 0.9, and β2 = 0.98 for optimization. Specifically, unlike pretraining, which uses a learning rate of 1e-3, we find that finetuning requires a much smaller learning rate. Based on validation performance, we select a learning rate of either 5e-6 or 5e-7 for finetuning our models. The batch size is set to 512 by default for experiments using MOIRAISmall, and reduced to 256 on MOIRAIBase if GPU memory reaches its limit. We adopt a constant learning rate scheduling, and early stopping is employed to monitor training. The context lengths are used directly from the values in the original Moirai models, which are tuned from a range of [1000, 2000, 3000, 4000, 5000]. The patch sizes are also taken from their provided values, which are selected based on data frequency. For Moment and UNITS, we directly follow their provided their original finetuning configurations for experiments, with the learning rate selected from 5e-5, 5e-6, or 5e-7. |