Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval

Authors: Jialin Chen, Ziyu Zhao, Gaukhar Nurbek, Aosong Feng, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, Rex Ying

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple domains highlight its dual utility, as both an effective encoder for downstream applications and a general-purpose retriever to enhance time-series models 2. Extensive experiments on both public benchmarks and our curated multimodal dataset validate the effectiveness of TRACE, demonstrating superior retrieval accuracy.
Researcher Affiliation	Academia	Jialin Chen1 , Ziyu Zhao2 , Gaukhar Nurbek3, Aosong Feng1, Ali Maatouk1, Leandros Tassiulas1, Yifeng Gao3, Rex Ying1 1Yale University, 2Mc Gill University, 3University of Texas Rio Grande Valley EMAIL, EMAIL; EMAIL
Pseudocode	No	The paper describes methods in Section 3 'Proposed Method' using descriptive paragraphs and mathematical formulations, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	Yes	Codes are available at https://github.com/Graph-and-Geometric-Learning/TRACE-Multimodal-TSEncoder.
Open Datasets	Yes	To support real-world multimodal time series applications, we construct a new dataset in the weather domain... The event reports are sourced from the NOAA Events Database [51], while the associated time series data are retrieved from the NOAA Global Historical Climatology Network (GHCN) [52]. To evaluate performance in the univariate setting, we further incorporate the three largest subsets Health, Energy, and Environment from Time MMD [5], a multimodal benchmark designed for time series forecasting...
Dataset Splits	Yes	Our curated weather dataset contains a total of 74,337 time series instances. We allocate 9,561 of these exclusively for the forecasting task... Table 8: Dataset size for each task. Dataset Type Train Test Val Total Newly Curated Weather Dataset Forecasting (H=7) 6,690 957 1,914 9,561 Pretraining & Classification 45,339 6,484 12,953 64,776 Public Dataset from Time MMD [5] Health (H=12) 929 266 129 1,324 Energy (H=12) 992 284 138 1,414 Environment (H=48) 7,628 2,173 1,064 10,865
Hardware Specification	Yes	All experiments are conducted over five runs with different random seeds on NVIDIA A100 40GB GPUs.
Software Dependencies	No	All models are implemented in Py Torch and trained on NVIDIA A100 40GB GPUs. For most time series models, we adopt the implementation from TSLib [58]3. The paper mentions software names but does not provide specific version numbers for PyTorch or TSLib, which is required for reproducibility.
Experiment Setup	Yes	The default TRACE consists of a 6-layer Transformer encoder with a hidden dimension of 384 and 6 attention heads. We use the Adam W [55] optimizer with a linear warmup followed by a cosine decay schedule. Pre-training is conducted with a mask ratio of 0.3, and runs for up to 400 epochs. We take 32 in-batch negative samples at each level in the alignment stage and run for up to 300 epochs. The sequence length is fixed at 96 for both prediction horizons of 7 and 24. We use mean squared error (MSE) as the loss function for forecasting tasks, and accuracy for classification. Forecasting models are trained for 10 epochs, while classification models are trained for up to 150 epochs with early stopping.