Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE

Authors: Jiawen Wei, jiang lan, Pengbo Wei, Ziwen Ye, Teng Song, Chen Chen, Guangrui Ma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on eight real-world datasets demonstrate the superiority of TARFVAE over the existing state-of-the-art deterministic and generative baselines. Our code is available at https://github.com/Gavine77/TARFVAE.
Researcher Affiliation	Industry	Jiawen Wei Meituan Beijing, China EMAIL Jiang Meituan Beijing, China EMAIL Wei Meituan Beijing, China EMAIL Ye Meituan Beijing, China EMAIL Song Meituan Beijing, China EMAIL Chen Meituan Beijing, China EMAIL Ma Meituan Beijing, China EMAIL
Pseudocode	No	The paper describes the methodology using mathematical formulations (e.g., Section 3.1 Variational Auto Encoder, Section 3.2 Normalizing Flow) and architectural diagrams (Figure 1, Figure 2), but does not contain a distinct pseudocode block or algorithm section.
Open Source Code	Yes	Our code is available at https://github.com/Gavine77/TARFVAE.
Open Datasets	Yes	To comprehensively evaluate the performance of our proposed TARFVAE, we conduct extensive experiments on 8 widely-used real-world datasets: four ETT subsets (ETTh1, ETTh2, ETTm1, ETTm2), Electricity, Exchange, Weather[6, 7], and Solar-Energy[41].
Dataset Splits	No	When comparing with deterministic models, the long-term forecasting benchmarks follow the common setting[5 7, 17], with the lookback window length L set to 96 and the prediction horizon H to {96, 192, 336, 720} for all datasets. For comparison with mr-Diff and its benchmarked baselines, we adopt the same configurations: H is 168 for Electricity and ETTh1, 192 for ETTm1, and 672 for Weather, while L is chosen from {96, 192, 336, 720, 1440}. Mean Squared Error (MSE) and Mean Absolute Error (MAE) are adopted as evaluation metrics. Since we can sample different sizes of results once our model is trained, we calculate MSE and MAE for the median of sampled results. We also compute the Continuous Ranked Probability Score (CRPS)[44] based on sampled results as a probabilistic forecasting metric. All our experiments are implemented using Py Torch[45] on a single Nvidia-H20 GPU with 141 GB memory, except for the inference efficiency comparison experiment which is conducted on an Nvidia-A6000 GPU with 48 GB memory to align with the experimental settings of the compared generative baselines. Our training process is guided by the loss function (28) and employs the ADAM optimizer, and the best model is selected based on the MSE of the median of 50 generated samples on the validation set.
Hardware Specification	Yes	All our experiments are implemented using Py Torch[45] on a single Nvidia-H20 GPU with 141 GB memory, except for the inference efficiency comparison experiment which is conducted on an Nvidia-A6000 GPU with 48 GB memory to align with the experimental settings of the compared generative baselines.
Software Dependencies	No	All our experiments are implemented using Py Torch[45] on a single Nvidia-H20 GPU with 141 GB memory. While Py Torch is mentioned, a specific version number is not provided, nor are other software dependencies with their versions.
Experiment Setup	No	Our training process is guided by the loss function (28) and employs the ADAM optimizer, and the best model is selected based on the MSE of the median of 50 generated samples on the validation set. While the loss function and optimizer are mentioned, specific hyperparameters like learning rate, batch size, or number of epochs are not provided in the main text.