Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach

Authors: Tal Gonen, Itai Pemper, Ilan Naiman, Nimrod Berman, Omri Azencot

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we conduct the first large-scale study evaluating leading generative models in data-scarce settings, revealing a substantial performance gap between full-data and data-scarce regimes. Our third main contribution is the empirical demonstration of significant improvements over existing methods. We believe that our model, extensive empirical evaluation, and the proposed benchmark, can serve as a foundation for further research and drive progress in time series generation under data-constrained environments.
Researcher Affiliation	Academia	Tal Gonen Itai Pemper Ilan Naiman Nimrod Berman Omri Azencot Faculty of Computer and Information Science Ben-Gurion University of The Negev EMAIL EMAIL
Pseudocode	Yes	We detail the procedure for each stage of our two-step framework and provide the complete pseudocode for both the pre-training and few-shot phases in Algorithm 1.
Open Source Code	Yes	Code is available at https://github.com/azencot-group/Imagen Few.
Open Datasets	Yes	To evaluate their performance, we collect a suite of 12 real-world and synthetic datasets covering a wide spectrum of domains, temporal dynamics, and channel dimensionalities: Mu Jo Co [66], ETTm1, ETTm2, ETTh2 [75], Sine, Weather [69], ILI [12], Saugeen River Flow [47], ECG200 [53], Self Regulation SCP1 [2], Air Quality [72], and Star Light Curves [56]. We provide detailed statistics and preprocessing steps in App. A.2.
Dataset Splits	Yes	To simulate data scarcity, we subsample each dataset using two complementary strategies: percentage-based sampling, retaining 5%, 10%, or 15% of the sequences; and fixed-count sampling, limiting the training set to {#10, #25, #50} sequences.
Hardware Specification	Yes	During pre-training, we utilize the full corpus over 1,000 epochs with a learning rate of 10 4, conducted in a distributed setup across two NVIDIA RTX 4090 GPUs, requiring roughly 4 hours of training time.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies. It mentions using 'Adam W' as an optimizer and builds on 'EDM [36]' and 'Imagen Time [50]' frameworks but lacks version details for Python, PyTorch, or other libraries.
Experiment Setup	Yes	All hyperparameters specific to the pre-training phase are listed in Table 8, while the core architectural parameters, shared between both pre-training and fine-tuning, are summarized in Table 9. All hyperparameters specific to the fine-tuning phase are listed in Table 10.