Robust Kernel Density Estimation with Median-of-Means principle

Authors: Pierre Humbert, Batiste Le Bars, Ludovic Minvielle

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we introduce a robust nonparametric density estimator combining the popular Kernel Density Estimation method and the Medianof-Means principle (Mo M-KDE). Finally, we show that Mo M-KDE achieves competitive results when compared with other robust kernel estimators, while having significantly lower computational complexity. In this section, we display numerical results supporting the relevance of Mo M-KDE. All experiments were run on a personal laptop computer using Python.
Researcher Affiliation Academia 1Université Paris-Saclay, CNRS, Inria, Laboratoire de mathématiques d Orsay, 91405, Orsay, France. 2Université Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRISt AL, F-59000 Lille. 3Université Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, F-91190, Gif-sur-Yvette, France. Correspondence to: Batiste Le Bars <batiste.le-bars@inria.fr>, Pierre Humbert <pierre.humbert@universite-paris-saclay.fr>.
Pseudocode No The paper describes the Mo M-KDE method in detail using mathematical definitions and textual explanations, but it does not include a formal pseudocode or algorithm block.
Open Source Code Yes The code of Mo M-KDE is made available online.1. https://github.com/lminvielle/mom-kde. For the sake of comparison, we also implemented RKDE and SPKDE.
Open Datasets Yes Experiments are also conducted over six classification datasets: Banana, German, Titanic, Breast-cancer, Iris, and Digits. They are all publicly available either from open repositories at http://www.raetschlab.org/Members/ raetsch/benchmark/ (for the first three) or directly from the Scikit-learn package (for the last three) (Pedregosa et al., 2011).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. It mentions 'the bandwidth h is chosen for KDE via the pseudo-likelihood k-cross-validation method', but this is a method for parameter selection, not a global dataset split for reproducibility.
Hardware Specification No The paper states: 'All experiments were run on a personal laptop computer using Python.' This is too general and does not provide specific hardware details such as CPU/GPU models, memory, or other specifications needed for reproducibility.
Software Dependencies No The paper mentions 'using Python' and 'the Scikit-learn package', but it does not provide specific version numbers for these or any other software libraries or dependencies. Therefore, a reproducible description of ancillary software is not provided.
Experiment Setup Yes The number of blocks S in Mo M-KDE is selected on a regular grid of 20 values between 1 and 2|O| + 1 in order to obtain the lowest DJS. The bandwidth h is chosen for KDE via the pseudo-likelihood k-cross-validation method (Turlach, 1993), and used for all estimators. The construction of RKDE follows exactly the indications of its authors (Kim and Scott, 2012) and ρ( ) is taken to be the Hampel function as it empirically showed to be the most robust. For SPKDE, the true ratio of anomalies is given as an input parameter.