Robust Kernel Density Estimation with Median-of-Means principle
Authors: Pierre Humbert, Batiste Le Bars, Ludovic Minvielle
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we introduce a robust nonparametric density estimator combining the popular Kernel Density Estimation method and the Medianof-Means principle (Mo M-KDE). Finally, we show that Mo M-KDE achieves competitive results when compared with other robust kernel estimators, while having significantly lower computational complexity. In this section, we display numerical results supporting the relevance of Mo M-KDE. All experiments were run on a personal laptop computer using Python. |
| Researcher Affiliation | Academia | 1Université Paris-Saclay, CNRS, Inria, Laboratoire de mathématiques d Orsay, 91405, Orsay, France. 2Université Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRISt AL, F-59000 Lille. 3Université Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, F-91190, Gif-sur-Yvette, France. Correspondence to: Batiste Le Bars <batiste.le-bars@inria.fr>, Pierre Humbert <pierre.humbert@universite-paris-saclay.fr>. |
| Pseudocode | No | The paper describes the Mo M-KDE method in detail using mathematical definitions and textual explanations, but it does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The code of Mo M-KDE is made available online.1. https://github.com/lminvielle/mom-kde. For the sake of comparison, we also implemented RKDE and SPKDE. |
| Open Datasets | Yes | Experiments are also conducted over six classification datasets: Banana, German, Titanic, Breast-cancer, Iris, and Digits. They are all publicly available either from open repositories at http://www.raetschlab.org/Members/ raetsch/benchmark/ (for the first three) or directly from the Scikit-learn package (for the last three) (Pedregosa et al., 2011). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. It mentions 'the bandwidth h is chosen for KDE via the pseudo-likelihood k-cross-validation method', but this is a method for parameter selection, not a global dataset split for reproducibility. |
| Hardware Specification | No | The paper states: 'All experiments were run on a personal laptop computer using Python.' This is too general and does not provide specific hardware details such as CPU/GPU models, memory, or other specifications needed for reproducibility. |
| Software Dependencies | No | The paper mentions 'using Python' and 'the Scikit-learn package', but it does not provide specific version numbers for these or any other software libraries or dependencies. Therefore, a reproducible description of ancillary software is not provided. |
| Experiment Setup | Yes | The number of blocks S in Mo M-KDE is selected on a regular grid of 20 values between 1 and 2|O| + 1 in order to obtain the lowest DJS. The bandwidth h is chosen for KDE via the pseudo-likelihood k-cross-validation method (Turlach, 1993), and used for all estimators. The construction of RKDE follows exactly the indications of its authors (Kim and Scott, 2012) and ρ( ) is taken to be the Hampel function as it empirically showed to be the most robust. For SPKDE, the true ratio of anomalies is given as an input parameter. |