Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nystrรถm Kernel Mean Embeddings

Authors: Antoine Chatalic, Nicolas Schreuder, Lorenzo Rosasco, Alessandro Rudi

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a proof of work in a simple experimental setting, but extending these results to broader families of datasets and kernel types would be interesting in the future. We first generate data according to a Gaussian mixture... We then perform experiments with data from the Fasttext (Bojanowski et al. 2016) (english features), FMA (Defferrard et al. 2016) (MFCC features), Intel Lab and Gowalla (Cho et al. 2011) datasets...
Researcher Affiliation Academia 1Ma LGA & DIBRIS, Universit a di Genova 2Inria, Ecole normale sup erieure, PSL Research University 3CBMM, MIT, IIT.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any information about open-source code for the described methodology.
Open Datasets Yes We then perform experiments with data from the Fasttext (Bojanowski et al. 2016) (english features), FMA (Defferrard et al. 2016) (MFCC features), Intel Lab and Gowalla (Cho et al. 2011) datasets... https://fasttext.cc/docs/en/ english-vectors.html https://github.com/mdeff/fma http://db.csail.mit.edu/labdata/labdata. html https://snap.stanford.edu/data/ loc-gowalla.html
Dataset Splits No For each dataset, we consider ฯ to be the uniform distribution over these points, and we build the empirical estimator using a random sample of size n = 104. The paper specifies sample size but does not provide details on specific training, validation, or test splits, percentages, or absolute counts for these subsets.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes the standard deviation ฯƒk of the kernel is chosen to be the median of the inter-points distance, estimated for efficiency on a random subset of 1000 points.