Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning with Expected Signatures: Theory and Applications

Authors: Lorenzo Lucchese, Mikko S. Pakkanen, Almut E. D. Veraart

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The convergence results proved in this paper bridge the gap between the expected signature s empirical discrete-time estimator and its theoretical continuous-time value, allowing for a more complete probabilistic interpretation of expected signature-based ML methods. Moreover, when the data generating process is a martingale, we suggest a simple modification of the expected signature estimator with significantly lower mean squared error and empirically demonstrate how it can be effectively applied to improve predictive performance. In this section we review a few algorithms from the literature, showcasing the practical relevance of the asymptotic results of Section 2.1 and the potential improvements achieved by the martingale correction introduced in Section 2.2.
Researcher Affiliation	Academia	1Department of Mathematics, Imperial College London, London, United Kingdom. Correspondence to: Lorenzo Lucchese <EMAIL, EMAIL>.
Pseudocode	Yes	Algorithm 1 Gaussian Process augmented Expected Signature (GPES) classifier, forward pass hyperparameters Signature truncation level k N, tensor normalization parameter C R+, data augmentation size N N, in-fill partition π2 M2 [0,T ] s.t. M2 N. parameters Biases bµ Rdµ, bΣ RdΣ, bout Rdout and weights Wµ Rdµ din, WΣ RdΣ din, Wout Rdout dsig where din (d M1 + M1 + M2), dµ d M2, dΣ d M2(d M2 + 1)/2, dsig (d + . . . + dk) and dout \|C\|. input x Rd M1 and π1 M1 [0,T ]. 1: µx,π1,π2 bµ + Wµ(x, π1, π2). 2: Lx,π1,π2 bΣ + WΣ(x, π1, π2) and Σx,π1,π2 Lx,π1,π2LT x,π1,π2. 3: for n {1, . . . , N} do 4: Xn,π1 x. 5: Sample Xn,π2 N(µx,π1,π2, Σx,π1,π2). 6: Signature of Xn,π1 π2: Sn = S k(Xn,π1 π2)[0,T ] ĉ 1S k c (Xn,π1 π2)[0,T ] Rdsig. 7: Tensor normalization: Sn λC(Sn). 8: end for 9: Expected signature ES 1 N PN n=1 Sn. output ĉ softmax(bout + Wout ES).
Open Source Code	Yes	Code and examples demonstrating the integration of the martingale correction into machine learning algorithms, along with the simulation results from the previous section, are available at https://github. com/lorenzolucchese/esig. The code is designed to be compatible with Python-based ML pipelines, supporting both numpy arrays and torch tensors. The code used to produce the results of Table 2 and Table 3 is available at https://github.com/lorenzolucchese/distribution-regression-streams. The code used to produce the results of Table 4 and Table 5 can be found at https://github.com/ lorenzolucchese/controlled-linear-regression.
Open Datasets	Yes	We replicate the synthetic data experiments of Triggiano & Romito (2024) on the (FBM), (OU) and (Bidim) datasets. ... We repeat two of the synthetic data experiments conducted in Lemercier et al. (2021), analyzing the performance of the SES model without and with martingale correction (MC). In the first experiment (Lemercier et al., 2021, Section 5.2), the task is to infer the temperature of an ideal gas from the paths of N = 20 particles moving in a box. ... The second experiment (Lemercier et al., 2021, Section 5.3) concerns the estimation of the mean-reversion parameter in a rough volatility model. More precisely, the task is to infer the value of a [10 6, 1] from a sample {sn π}N n=1 of (discretely observed) paths σn = {σn t , t π} over the partition π = {0, 0.01, . . . , 2} with continuous-time dynamics d Zt = a(Zt µ)dt + νd BH t , σt = exp Zt, t [0, 2], where {BH t , t [0, 2]} is a fractional Brownian motion with Hurst parameter H = 0.2, µ = 0.5, ν = 0.3 and Z0 = 0.5.
Dataset Splits	Yes	When fitting the models we take the optimal hyperparameters cross-validated by Triggiano & Romito (2024) and apply cross-validated SGD to the training dataset. That is, we use 80% of the training dataset to iterate through SGD parameter updates, while keeping the remaining 20% of the training dataset (the validation set) to determine when the procedure has converged without overfitting. ... In both experiments, we keep the same training-evaluation pipeline as the one considered in the original paper, namely nested k-fold cross-validation with 5 outer folds for evaluation and 3 inner folds for hyperparameter selection (including the signature truncation k1 and a Lasso regularization parameter).
Hardware Specification	No	The paper does not provide specific hardware details for running its experiments. It only mentions that the code is compatible with Python-based ML pipelines, which is too general.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It only broadly states compatibility with "Python-based ML pipelines, supporting both numpy arrays and torch tensors."
Experiment Setup	Yes	When fitting the models we take the optimal hyperparameters cross-validated by Triggiano & Romito (2024) and apply cross-validated SGD to the training dataset. That is, we use 80% of the training dataset to iterate through SGD parameter updates, while keeping the remaining 20% of the training dataset (the validation set) to determine when the procedure has converged without overfitting. ... The only hyperparameter we modify is the truncation level k which we set to 4 for computational reasons (in the original paper the optimal value was found to be 5 or 6, depending on the dataset). ... We thus repeat the SGD routine over 5 different parameter initializations and pick the model with best validation performance. ... In both experiments, we keep the same training-evaluation pipeline as the one considered in the original paper, namely nested k-fold cross-validation with 5 outer folds for evaluation and 3 inner folds for hyperparameter selection (including the signature truncation k1 and a Lasso regularization parameter).