Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting

Authors: Hui Chen, Viet Luong, Lopamudra Mukherjee, Vikas Singh

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we cover our experimental findings in detail. We divide our experimental protocol into two phases: evaluating the quality of forecasting both for long-term and short-term and an ablation study to evaluate the efficacy of our proposed model, Simple TM.
Researcher Affiliation	Academia	Hui Chen1 Viet Luong1 Lopamudra Mukherjee2 Vikas Singh1 1University of Wisconsin-Madison 2University of Wisconsin-Whitewater EMAIL EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes the methodology in narrative text and does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at Git Hub: https://github.com/vsingh-group/SimpleTM.
Open Datasets	Yes	We evaluate our model on 8 widely recognized benchmarks: the ETT datasets (ETTh1, ETTh2, ETTm1, and ETTm2)... as well as the Weather, Solar-Energy, Electricity, and Traffic datasets... We adopt the PEMS dataset Chen et al. (2001) with four public traffic subsets (PEMS03, PEMS04, PEMS07, and PEMS08).
Dataset Splits	Yes	We mainly follow the experimental configurations in Wu et al. (2023), including the same data processing and splitting protocol... Details of the dataset are provided in Table 4. Table 4: Dataset statistics. The dimension indicates the number of channels/variates, and the dataset size is organized in (training, validation, testing).
Hardware Specification	Yes	All experiments were conducted using Py Torch Paszke et al. (2019) on a single NVIDIA A100 40GB GPU.
Software Dependencies	No	All experiments were conducted using Py Torch Paszke et al. (2019)... The paper mentions PyTorch but does not specify a version number for the software used in the experiments.
Experiment Setup	Yes	Table 5 summarizes the hyperparameters and training settings used in our experiments. Our hyperparameter selection followed a systematic approach, combining grid search with domain-specific considerations. The number of layers was fixed at 1, and the input length L was set to 96 for all datasets and baselines... For training parameters, we performed a grid search over learning rates within a logarithmic scale from 10-3 to 2 * 10-2. Batch sizes and training epochs were systematically evaluated within the ranges {16, 24, 256} and {10, 20}, respectively.