Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VerbalTS: Generating Time Series from Texts

Authors: Shuqi Gu, Chuyue Li, Baoyu Jing, Kan Ren

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two synthetic and four real-world datasets demonstrate that VERBALTS outperforms existing methods in both generation quality and semantic alignment with textual conditions. The project page is at https://seqml.github.io/Verbal TS/.
Researcher Affiliation	Academia	1School of Information Science and Technology, Shanghai Tech University, Shanghai, China 2University of Illinois at Urbana Champaign, Illinois, United States.
Pseudocode	Yes	Algorithm 1 Pseudocode for the CTTP Model Input:A batch of time series and text pairs (X RB K L, C NB M) Output:Total cross-entropy loss Lcross
Open Source Code	Yes	We have released all the reproducible code and benchmarking datasets at https://seqml.github.io/Verbal TS/.
Open Datasets	Yes	Experiments on two synthetic and four real-world datasets demonstrate that VERBALTS outperforms existing methods in both generation quality and semantic alignment with textual conditions. The project page is at https://seqml.github.io/Verbal TS/.
Dataset Splits	Yes	We randomly split the samples into training set, validation set, and test set in a ratio of 6: 1: 1. Finally, we get 24000 training samples, 2400 validation samples, and 2400 test samples.
Hardware Specification	No	The authors also gratefully acknowledge further assistance provided by Shanghai Frontiers Science Center of Human-centered Artificial Intelligence, Mo E Key Lab of Intelligent Perception and Human-Machine Collaboration, and HPC Platform of Shanghai Tech University.
Software Dependencies	No	we used the tsfresh (Nils Braun, 2024) library to extract 6 time series features, serving as attributes for baseline input. Then, text annotations are generated from extracted features through prompt templates. Details are given in Sec. A.2.1.
Experiment Setup	Yes	For all experiments, we set the number of diffusion steps as T = 50, embedding size for attributes and time series as 64. For training, we use Adam optimizer to train the model, the initial learning rate is set to be 1e-4 with Multi Step LR scheduler for all datasets, the batch size is set to be 512 for Synth-M, Synth-U, Weather, ETTm1 and Traffic, 16 for Blind Ways. For the hyperparameters of the multi-focal modeling, (R, S) = (3, 3) for all datasets. All our experiments were conducted three times running with different random seeds.