Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models

Authors: Benjamin Walker, Lingyi Yang, Nicola Muca Cirone, Cristopher Salvi, Terry Lyons

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, SLi CEs solve the A5 state-tracking benchmark with a single layer, achieve best-in-class length generalisation on regular language tasks among parallel-in-time models, and match the performance of log neural controlled differential equations on six multivariate time-series classification datasets while cutting the average time per training step by a factor of twenty. Section 5 is titled "Experiments" and contains detailed empirical evaluation.
Researcher Affiliation	Academia	1Mathematical Institute, University of Oxford 2Department of Mathematics, Imperial College London
Pseudocode	Yes	Algorithm 1 provides a pseudo-code implementation for the forward pass of a SLi CE.
Open Source Code	Yes	Open-source implementations of SLi CEs in both Py Torch and JAX, along with code to fully reproduce all experiments from this paper. These are available at https://github.com/ Benjamin-Walker/structured-linear-cdes (Py Torch) and https://github.com/ Benjamin-Walker/log-neural-cdes (JAX).
Open Datasets	Yes	We use the following publicly available datasets, libraries, and baseline models: A5 Benchmark [66]. License: MIT. URL: https://github.com/jopetty/word-problem Formal Language Benchmark [31]. License: CC-BY-4.0. URL: https://arxiv.org/abs/2207.02098 UEA Multivariate Time Series Classification Archive [4]. License: GPL-3.0. URL: https://www.timeseriesclassification.com/, https://github.com/ time-series-machine-learning/tsml-repo
Dataset Splits	Yes	To assess length generalisation, we select the models that achieve at least 90% validation accuracy on sequences of length 20 and retrain them on sequences ranging from 3 to 40. Early stopping is performed using a validation set with sequence lengths from 40 to 128. ... This benchmark tests the ability of models to length generalise on state-tracking tasks, by training models on sequences from length 3 to 40 and evaluating models on sequences from length 40 to 256. ... The experiments on the UEA multivariate time series classification archive follow the approach of Walker et al. [98], using the same data splits.
Hardware Specification	Yes	The GPU memory and time per 1000 training steps were recalculated for all models on an NVIDIA H100.
Software Dependencies	No	The paper cites JAX [10] and PyTorch [75] but does not provide specific version numbers for the software dependencies used in the experiments. It only references the year of their initial publications.
Experiment Setup	Yes	Models are trained using a token-tagging loss for 100,000 steps with a batch size of 256. For all sequence lengths, a small batch of sequences of length 2 are included at each training step to aid convergence. All models use Adam [54] with weight decay as their optimiser, and linear warm-up followed by cosine annealing with a minimum learning rate of 10 5 and a maximum learning rate of 10 3. Additionally, all models use dropout [90] at a rate of 0.1 and a trainable embedding layer.