Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unified Transferability Metrics for Time Series Foundation Models

Authors: Weiyang Zhang, Xinyang Chen, Xiucheng Li, Kehai Chen, Weili Guan, Liqiang Nie

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive benchmarking across 5 distinct downstream tasks, our method demonstrates superior capability in identifying optimal pre-trained models from heterogeneous model pools for transfer learning. Compared to the state-of-the-art method ETran, our approach improves the weighted Kendall s τw across 5 downstream tasks by 35%.
Researcher Affiliation	Academia	1School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) 2School of Information Science and Technology, Harbin Institute of Technology (Shenzhen) EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Power Iteration Input: Matrix H, initial vector v0, number of iterations k, here we set 10. Output: Dominant eigenvalue λmax, corresponding eigenvector vk Initialize v0 randomly for i = 1 to k do vi+1 = Hvi Normalize vi+1 to unit length end for Estimate the dominant eigenvalue λmax = v T k Hvk v T k vk
Open Source Code	Yes	The code is available at https://github.com/TEMPLATE.
Open Datasets	Yes	Specifically, We verify all methods on 9 multivariate datasets from the UEA classification archive [29]... For long-term forecasting, we use seven widely recognized long-term time series forecasting datasets [31]... For short-term forecasting, we adopt the M4 dataset [49]... We compare 5 widely used anomaly detection benchmarks: SMD [50], MSL [51], SMAP [51], SWa T [52], and PSM [53]
Dataset Splits	Yes	We provide detailed descriptions of the datasets in Tables 8. For all 5 downstream tasks, we follow the experimental setup of [34]. ... The dataset size is organized in (Train, Validation, Test). ETTm1, ETTm2 7 {96, 192, 336, 720} (34465, 11521, 11521) Electricity (15 mins)
Hardware Specification	Yes	The fine-tuning experiments of the pre-trained models were conducted on an NVIDIA H20 GPU with 96GB of memory. ... All the results of pre-trained model transferability evaluation metrics were obtained on an AMD EPYC 7513 32-Core CPU.
Software Dependencies	No	The paper does not explicitly state specific version numbers for software dependencies or libraries used in their experiments. It mentions using 'fine-tuned the pre-trained models through hyperparameter grid search' and 'follow the experimental setup of [34]' but does not detail the software stack with versions.
Experiment Setup	Yes	To compute the transfer performance values, we carefully fine-tuned the pre-trained models through hyperparameter grid search. As [54] highlighted, learning rate and weight decay are the two most critical parameters. Therefore, we performed grid search over learning rates and weight decay values (6 learning rates ranging from 10 3 to 10 5 , and 3 weight decay values from 10 3 to 10 5) to select the optimal hyperparameters.