Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Regularized least squares learning with heavy-tailed noise is minimax optimal

Authors: Mattes Mollenhauer, Nicole Muecke, Dimitri Meunier, Arthur Gretton

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Appendix A contains a numerical experiment which confirms the behavior of the excess risk described by our theoretical results.
Researcher Affiliation	Collaboration	Mattes Mollenhauer Merantix Momentum EMAIL Nicole Mücke Technische Universität Braunschweig EMAIL Dimitri Meunier Gatsby Computational Neuroscience Unit, UCL EMAIL
Pseudocode	No	The paper primarily presents mathematical derivations, theorems, and proofs. It does not contain any explicit pseudocode blocks or algorithm listings formatted as such.
Open Source Code	Yes	We provide the source code on Git Hub: https://github.com/mollenhauerm/krr-heavy-tailed.
Open Datasets	No	We consider the input space X := R equipped with the RKHS H induced by the radial basis kernel k(x1, x2) := exp( \|x1 x2\|2 2 ) and define the target function f (x) := P5 i=1 aik(xi, x) H, x X for vectors a := (2, 1, 3, 1, 2) and x := ( 4, 2 0, 3, 7). We define the covariate distribution X π := N(0, 1) on X and generate independent observation pairs with a light-tailed noise distribution and a heavy-tailed distribution with identical variance based on (i) the light-tailed noise model given by Y(N) = f (X) + ε(N), (21) where the noise ε(N) N(0, σ2) follows a centered Gaussian distribution with variance σ2 = 3. (ii) the heavy-tailed noise model with a finite number of higher moments given by Y(t) = f (X) + ε(t), (22) where the noise ε(t) t(0, ν) follows a centered t-distribution with ν = 3 degrees of freedom and E[ε2 (t)] = 3.
Dataset Splits	No	For both models, we compute bfα based on the generated sample pairs for a sample size n = 20 and record the error Iπ bfα f L2(π), which we approximate through Monte Carlo simulation by drawing samples from π. We perform the above computation for a selection of different regularization parameters α, repeating each experiment across 10000 random seeds (per model and choice of α). The paper describes generating synthetic data and using a sample size of n=20, along with Monte Carlo simulations. It does not specify any training/test/validation splits for a pre-existing dataset.
Hardware Specification	Yes	The experiment can be run on the CPU of any consumer laptop.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers, such as Python libraries or frameworks.
Experiment Setup	Yes	For both models, we compute bfα based on the generated sample pairs for a sample size n = 20 and record the error Iπ bfα f L2(π), which we approximate through Monte Carlo simulation by drawing samples from π. We perform the above computation for a selection of different regularization parameters α, repeating each experiment across 10000 random seeds (per model and choice of α). where the noise ε(N) N(0, σ2) follows a centered Gaussian distribution with variance σ2 = 3. where the noise ε(t) t(0, ν) follows a centered t-distribution with ν = 3 degrees of freedom and E[ε2 (t)] = 3.