Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting

Authors: Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that WAVE attention that incorporates the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.
Researcher Affiliation	Collaboration	1Georgia Institute of Technology 2AWS. Correspondence to: Jiecheng Lu <EMAIL>, Shihao Yang <EMAIL>.
Pseudocode	No	The paper describes methods in prose and mathematical notation but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code implementation is available at the following link.
Open Datasets	Yes	Our main MTSF experiments are conducted on 12 widely-used real-world time series datasets. These datasets are summarized as follows: Weather Dataset1(Wu et al., 2021) ... Solar Dataset2(Lai et al., 2018) ... Electricity Dataset3(Wu et al., 2021) ... ETT Dataset4(Zhou et al., 2021) ... Traffic Dataset5(Wu et al., 2021) ... PEMS Dataset6(Li et al., 2017).
Dataset Splits	Yes	We use the same train-validation-test set splitting ratio as in previous studies by Zeng et al. (2023); Nie et al. (2022); Liu et al. (2024b). We also follow the same dataset standardization methods used in these studies.
Hardware Specification	Yes	All training tasks in this paper can be conducted using a single Nvidia RTX 4090 GPU.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer', 'Layer Norm', and 'RMSNorm' but does not provide specific version numbers for any of these or other key software dependencies.
Experiment Setup	Yes	For the hyper-parameter settings of the pure AR/WAVE Transformer, we use m = 3 Transformer layers, 8 heads, and set the hidden dimension d based on the number of series C, using the empirical formula d = 16 * C. We use 4d as the hidden dimension for the feedforward MLP in the Transformer layer. A dropout rate of 0.1 is applied to both the AR term and MA term. We initialize the weights of all linear layers and embedding layers using the GPT-2 weight initialization method, with a normal distribution and a standard deviation of 0.02. For the output projection layers in the attention and MLP, we additionally scale the standard deviation by a factor of 1/ m. The batch size is set to 32. During training, pure AR/WAVE Transformers are trained using the next-step prediction objective with MSE loss. We use the Adam W optimizer with betas=(0.9, 0.95) and weight decay=0.1. We evaluate the validation and test losses at the end of each epoch, with an early-stopping patience set to 12 epochs. The maximum number of training epochs is 100. We apply a linear warm-up for the learning rate, increasing it from 0.00006 to 0.0006 over the first 5 epochs, and gradually decreasing it in the subsequent epochs.