reproducibilityindex.ai

Learning the Infinitesimal Generator of Stochastic Diffusion Processes

Authors: Vladimir Kostic, Hélène Halconruy, Timothée Devergne, Karim Lounici, Massimiliano Pontil

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the statistical performance of a reduced-rank estimator in reproducing kernel Hilbert spaces (RKHS) in the partial knowledge setting. Notably, our approach provides learning bounds independent of the state space dimension and ensures non-spurious spectral estimation. Additionally, we elucidate how the distortion between the intrinsic energy-induced metric of the stochastic diffusion and the RKHS metric used for generator estimation impacts the spectral learning bounds.
Researcher Affiliation	Academia	Vladimir R. Kostic CSML, Istituto Italiano di Tecnologia University of Novi Sad vladimir.kostic@iit.it Karim Lounici CMAP-Ecole Polytechnique karim.lounici@polytechnique.edu Hélène Halconruy SAMOVAR, Télécom Sud-Paris MODAL X, Université Paris Nanterre helene.halconruy@telecom-sudparis.eu Timothée Devergne CSML & ATSIM, Istituto Italiano di Tecnologia timothee.devergne@iit.it Massimiliano Pontil CSML, Istituto Italiano di Tecnologia AI Centre, University College London massimiliano.pontil@iit.it
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce the results of the experiments can be found in the following repository: https://github.com/DevergneTimothee/GenLearn_kernel
Open Datasets	No	We have trained our method on a real 30-year US mortgage rates dataset and contrasted it with the fitted CIR model using continuous ranked probability scores that are estimated from the forecasts obtained by of each of them, see panel f) of Fig. 1. Each model has been trained using data from January 2009 to December 2016. The initial condition was the last week of December 2016 and the predictions were made for the years 2017 and 2018. Since the dataset is real, we used the imperfect partial knowledge, that is, for our method, we estimated the diffusion coefficient only via a least squares calibration of a CIR model over the training set. This allows more flexibility on the drift term in our model. For this experiment, we used an in-house code to simulate the system. The equations of motions were discretized using the Euler-Maruyama scheme with a timestep of 10 4. The paper does not provide concrete access information (link, DOI, repository name, formal citation with authors/year) for a publicly available or open dataset.
Dataset Splits	No	The choice of the kernel in all experiments was Gaussian RBF with specified length-scales, and the hyperparameters were chosen via cross-validation. While cross-validation is mentioned, specific train/validation/test dataset splits (e.g., percentages, absolute counts, or citations to predefined splits) are not provided.
Hardware Specification	Yes	All the experiments were performed on a workstation with 125.6 Gi B of memory and AMD Ryzen threadripper pro 3975wx 32-cores 64 processor, no graphics card was used.
Software Dependencies	No	The version of python used is Python 3.9.18. No other specific software components with version numbers (e.g., libraries, frameworks) are mentioned.
Experiment Setup	Yes	RRR was fitted using 1000 points, µ = 5 and γ = 10 5. The length scales used were 0.05 and 0.5. This experiment was reproduced 100 times leading to very small change in the estimation of the eigenfunctions. In Figure 2 we report the result of one of them. The reduced rank regression was performed with a rank of 5.