Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Nyström Kernel Mean Embeddings

Authors: Antoine Chatalic, Nicolas Schreuder, Lorenzo Rosasco, Alessandro Rudi

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a proof of work in a simple experimental setting, but extending these results to broader families of datasets and kernel types would be interesting in the future. We first generate data according to a Gaussian mixture... We then perform experiments with data from the Fasttext (Bojanowski et al. 2016) (english features), FMA (Defferrard et al. 2016) (MFCC features), Intel Lab and Gowalla (Cho et al. 2011) datasets...
Researcher Affiliation	Academia	1Ma LGA & DIBRIS, Universit a di Genova 2Inria, Ecole normale sup erieure, PSL Research University 3CBMM, MIT, IIT.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any information about open-source code for the described methodology.
Open Datasets	Yes	We then perform experiments with data from the Fasttext (Bojanowski et al. 2016) (english features), FMA (Defferrard et al. 2016) (MFCC features), Intel Lab and Gowalla (Cho et al. 2011) datasets... https://fasttext.cc/docs/en/ english-vectors.html https://github.com/mdeff/fma http://db.csail.mit.edu/labdata/labdata. html https://snap.stanford.edu/data/ loc-gowalla.html
Dataset Splits	No	For each dataset, we consider ρ to be the uniform distribution over these points, and we build the empirical estimator using a random sample of size n = 104. The paper specifies sample size but does not provide details on specific training, validation, or test splits, percentages, or absolute counts for these subsets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	the standard deviation σk of the kernel is chosen to be the median of the inter-points distance, estimated for efficiency on a random subset of 1000 points.