Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sequence Modeling with Spectral Mean Flows

Authors: Jinwoo Kim, Max Beier, Petar Bevanda, Nayun Kim, Seunghoon Hong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test spectral mean flows on a range of time-series datasets, demonstrating competitive results. 5 Experiments We demonstrate spectral mean flows on two synthetic setups and generative modeling on a range of time-series datasets. The main results are in Table 1. Spectral mean flow achieves the best metric in 18 out of 24 cases, showing that it is competitive with the state-of-the-art Diffusion-TS [101].
Researcher Affiliation	Academia	Jinwoo Kim KAIST Max Beier TU Munich Petar Bevanda TU Munich Nayun Kim KAIST Seunghoon Hong KAIST
Pseudocode	No	The paper describes the neural parameterization and learning process, detailing the architecture of MLPs and feature extractors. However, it does not present these steps in a structured pseudocode block or a clearly labeled algorithm.
Open Source Code	Yes	1Code is available at https://github.com/jw9730/spectral-mean-flow.
Open Datasets	Yes	We demonstrate spectral mean flows on two synthetic setups and generative modeling on a range of time-series datasets. Regular time series For the first experiment, we follow [100, 101] and use four real-world datasets Stocks, ETTh, Energy, f MRI and two simulated datasets Sines, Mu Jo Co of length-24 time series. Long time series For additional demonstrations of modeling longer time series than in Table 1, we use FRED-MD and NN5 Daily from the Monash repository [32]
Dataset Splits	Yes	For long time series... Following [71], we normalize each trajectory to adhere to a zero-centered normal distribution, and use 80% of the data for training and 20% for testing. Irregular time series For further demonstrations of generality, we run experiments on irregularly sampled time series. We obtain 3 irregularly sampled datasets from Stocks by randomly dropping 30%, 50%, and 70% of the observations, following [45, 72].
Hardware Specification	Yes	Each experiment is done with a single NVIDIA RTX A6000 GPU with 48GB and Intel Xeon Gold 6330 CPU @ 2.00GHz.
Software Dependencies	No	The paper mentions software like PyTorch [78], opt_einsum [17], flow_matching [61], Adam optimizer [22], AdamW optimizer [63], and Muon optimizer [46]. However, it does not provide specific version numbers for any of these software components, which is required for a reproducible description.
Experiment Setup	Yes	For training, we use Adam optimizer [22] with hyperparameters (β1, β2) = (0.9, 0.999). The model is trained for 20k iterations with learning rate 1e-3 and batch size 10,000, with 1k steps of linear learning rate warmup, and gradient norm clipping at 1.0. Detailed hyperparameters are in Table 6. The ranges considered for MLP( , , t) : Rd Rdh CLdh are batch size {128, 256}, layers L {10, 14, 16}, and hidden dimension dh = d h {64, 96}. Other choices mostly follow [101]: For training, we use Adam optimizer [22] with (β1, β2) = (0.9, 0.96). The model is trained for the same iterations per dataset as in [101] with learning rate 8e-4 and gradient clipping at 1.0, with 500 steps of linear warmup and then decaying by 0.5 on plateau with patience 2,000. For this setup,... we train our models using Adam W optimizer [63] with (β1, β2) = (0.5, 0.9) and weight decay 1e-6 for 100k iterations... we use batch size 512, learning rate 1e-4, and gradient clipping at 1.0.