Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Synthetic Series-Symbol Data Generation for Time Series Foundation Models

Authors: Wenxuan Wang, Kai Wu, yujian li, Dan Wang, Xiaoyu Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore multiple representation measures and conduct experimental verification on a variety of downstream task datasets to answer the following key questions: RQ1: Can the unrestrictedly generated S2 dataset comprehensively cover diverse representation types of time series data? RQ2: Can Sym Time pre-trained on the S2 dataset achieve competitive results across five major TSA tasks (forecasting, classification, imputation and anomaly detection)? RQ3: Can Sym Time learn fundamental representations of time series data on the synthetic S2 dataset to alleviate the data scarcity in TSA? RQ4: Are the multiple pre-training objectives in Sym Time effective, and can symbol expressions enhance TSA task performance? RQ5: How to demonstrate that Sym Time learns semantic information of symbols? ... Results. Radviz visualization confirms that S2 closely matches the Monash dataset across key statistics (stationarity, predictability, frequency, complexity, seasonality, trend), validating its use for pretraining. ... Main Results. Figure 5 (left) compares Sym Time with models of the same type, while Figure 6 presents Sym Time s performance against additional models across different tasks. These results demonstrate that Sym Time, pre-trained on the S2 dataset, successfully learns fundamental representations of time series data and achieves competitive results when fine-tuned on downstream tasks.
Researcher Affiliation	Academia	Wenxuan Wang School of Telecommunications Engineering Xidian University EMAIL Kai Wu School of Artificial Intelligence Xidian Univeristy EMAIL Yujian Betterest Li School of Artificial Intelligence Xidian University EMAIL Dan Wang School of Telecommunications Engineering Xidian University EMAIL Xiaoyu Zhang School of Cyber Engineering Xidian University EMAIL
Pseudocode	No	The paper describes methods and processes in narrative text and figures but does not present any explicitly labeled pseudocode or algorithm blocks with structured steps in a code-like format.
Open Source Code	Yes	The code is available at https://github.com/wwhenxuan/Sym Time. 2The code for S2 data generation is available at https://github.com/wwhenxuan/S2Generator.
Open Datasets	Yes	We use Radviz [70] to visualize high-dimensional statistical features of 256-length time series segments from our synthetic S2 and the Monash datasets [71]. From Monash (covering weather, traffic, electricity, tourism, medicine, and energy) we sample 200K segments per domain. We evaluate Sym Time on five TSA tasks: long-term forecasting, short-term forecasting, classification, imputation and anomaly detection, using the Times Net benchmark [72]. We use mean squared error (MSE) and mean absolute error (MAE) as the metrics for long-term forecasting and imputation tasks; overall weighted average (OWA) for short-term forecasting, which is unique metrics for M4 benchmark [73]; accuracy for classification; precision, recall and F1 score for anomaly detection. Detailed descriptions of datasets ...We adopt 8 real-world benchmark datasets for long-term forecasting, including ETTm1, ETTm2, ETTh1, ETTh2 [80], Weather [101], ECL [102], Traffic [103] and Exchange [104]. ...We adopt M4 benchmark [73] for short-term forecasting ...we test Sym Time s discriminative ability on 10 UEA multivariate time series classification datasets [105] ...We conduct experiments on 5 widely used anomaly detection datasets: SMD and SMAP [106], MSL [107], SWa T [108], PSM [109]
Dataset Splits	Yes	To evaluate the effectiveness of pre-training, we fine-tune the models on five major TSA tasks (with 0B denoting direct fine-tuning without pre-training). The experimental results are summarized in Tables 1, Table 2 and Figure 7. ...For long-term forecasting experiments, we employ input series of lookback lengths 96 and 512, with forecast horizons of 96, 192, 336, and 720. ...Table 15: Dataset descriptions. The dataset size is organized in (Train, Validation, Test). ...M4 benchmark [73] for short-term forecasting, which contains the yearly, quarterly and monthly collected univariate marketing data. Then, we use symmetric mean absolute error (SMAPE), mean absolute scaled error (MASE) and overall weighted average (OWA) to measure the forecasting performance, which are calculated as detailed in Appendix C.3. ...We similarly add masks in patch units (gray sections). It can be observed that the time series encoder in Sym Time also performs well in zero-shot reconstruction on real-world data [27, 123].
Hardware Specification	Yes	We conduct pre-training using data parallelism on a hardware setup consisting of 8 NVIDIA RTX A6000 GPUs with 48GB of memory each.
Software Dependencies	No	During the pre-training of Sym Time, we employ Adam W [125, 126] as the optimizer with the defult hyperparameter configuration for (β1, β2) as (0.9, 0.999). Then, we utilize the One Cycle policy to dynamically adjust the learning rate. For all downstream task fine-tuning experiments, we employ the Adam optimizer [125, 126] with hyperparameters (β1, β2) set to (0.9, 0.999).
Experiment Setup	Yes	Model Hyper-parameter. The parameter configurations for the time series encoder and symbol encoder in Sym Time are shown in Table 16. During model pre-training, we primarily set three hyperparameters: (1) the masking ratio of time series patches, (2) the masking ratio for natural language symbols, and (3) the proportion factor α used to balance pseudo-targets in momentum distillation. Based on the masked time series modeling pre-training experimental configuration of Patch TST [54] and Sim MTM [27], we set the masking ratio for time series to 40%. Following the experimental configuration of BERT in masked language modeling [19, 55], we set the masking ratio for symbolic data to 15%. Based on the experimental configuration of momentum distillation in ALBEF [60, 124, 59], we set α to 0.6. Training Configurations. During the pre-training of Sym Time, we employ Adam W [125, 126] as the optimizer with the defult hyperparameter configuration for (β1, β2) as (0.9, 0.999). Then, we utilize the One Cycle policy to dynamically adjust the learning rate. We set the warmup epochs to 10, during which the learning rate gradually grows up to an initial value of 5 × 10−5, and then adjust it dynamically using a cosine annealing schedule, with the minimum learning rate set at 1 × 10−7. We conduct pre-training using data parallelism on a hardware setup consisting of 8 NVIDIA RTX A6000 GPUs with 48GB of memory each. We set the batch size to 128 and trained for a total of 85 epochs. For the five major tasks in TSA, we conduct downstream task fine-tuning experiments using the configurations in Table 17. For all downstream task fine-tuning experiments, we employ the Adam optimizer [125, 126] with hyperparameters (β1, β2) set to (0.9, 0.999). The LR in the table represents the initial learning rate and we utilize the dynamic learning rate adjustment strategy from Times Net [72].