Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

TimeWak: Temporal Chained-Hashing Watermark for Time Series Data

Authors: Zhi Wen Soi, Chaoyi Zhu, Fouad Abiad, Aditya Shankar, Jeroen Galjaard, Huijuan Wang, Lydia Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate Time Wak on its impact on synthetic data quality, watermark detectability, and robustness under various post-editing attacks, against five datasets and baselines of different temporal lengths. Our results show that Time Wak achieves improvements of 61.96% in context-FID score, and 8.44% in correlational scores against the strongest state-of-the-art baseline, while remaining consistently detectable. Our code is available at https://github.com/soizhiwen/Time Wak.
Researcher Affiliation	Academia	1University of Neuchâtel 2Delft University of Technology EMAIL EMAIL
Pseudocode	No	The paper includes a diagram (Figure 1: Overview of Time Wak) that outlines steps but does not present formally structured pseudocode or algorithm blocks. The algorithm details are described in paragraph text within Section 3.2 Time Wak algorithm.
Open Source Code	Yes	Our code is available at https://github.com/soizhiwen/Time Wak. We keep the code for the proposed Time Wak and baselines on the following open sourced repository anonymously: https://anonymous.4open.science/r/Time Wak.
Open Datasets	Yes	We use five time series datasets to evaluate Time Wak s impact on generation quality, watermark detection accuracy, and robustness towards post-editing operations. These are: Stocks [33], ETTh [39], Mu Jo Co [27], Energy [3], and f MRI [22]. Additional dataset details are in Appendix D.1. Table 4: Details of datasets used in experiments. Stocks 3,773 6 https://finance.yahoo.com/quote/GOOG ETTh 17,420 7 https://github.com/zhouhaoyi/ETDataset Mu Jo Co 10,000 14 https://github.com/deepmind/dm_control Energy 19,711 28 https://archive.ics.uci.edu/ml/datasets f MRI 10,000 50 https://www.fmrib.ox.ac.uk/datasets
Dataset Splits	Yes	Dataset splits are 80% for training and 20% for testing.
Hardware Specification	Yes	All code implementations are done in Py Torch (version 2.3.1) using a single NVIDIA Ge Force RTX 2080 Graphics Card coupled with an Intel(R) Xeon(R) Platinum 8562Y+ CPU for all experiments.
Software Dependencies	Yes	All code implementations are done in Py Torch (version 2.3.1) using a single NVIDIA Ge Force RTX 2080 Graphics Card coupled with an Intel(R) Xeon(R) Platinum 8562Y+ CPU for all experiments.
Experiment Setup	Yes	We train the time series diffusion model following the Diffusion-TS settings [34], and generate 10,000 watermarked synthetic samples using Time Wak for each sampling run. E.11.1 Intervals Interval, also referred to as H, is one of the key hyperparameters in our approach. Based on our experiments in Table 18 (24-length), Table 19 (64-length), and Table 20 (128-length), we found that setting H = 2 yields the best results across most datasets. E.11.2 Bits Additionally, we present the values of bit-length L used across different experiments in Table 21 23.