MONK Outlier-Robust Mean Embedding Estimation by Median-of-Means

Authors: Matthieu Lerasle, Zoltan Szabo, Timothée Mathieu, Guillaume Lecue

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Implementation of the MONK estimators is the focus of Section 4, with numerical illustrations in Section 5. In this section, we demonstrate the performance of the proposed MONK estimators.
Researcher Affiliation Academia 1Laboratoire de Math ematiques d Orsay, Univ. Paris Sud, France 2CNRS, Universit e Paris Saclay, France 3CMAP, Ecole Polytechnique, Palaiseau, France 4CREST ENSAE Paris Tech, France.
Pseudocode Yes Algorithm 1 MONK BCD estimator for MMD. Algorithm 2 MONK BCD-Fast estimator for MMD.
Open Source Code Yes The Python code reproducing our numerical experiments is available at https://bitbucket.org/ Timothee Mathieu/monk-mmd; it relies on the ITE toolbox (Szab o, 2014).
Open Datasets Yes In order to demonstrate the applicability of our estimators in biological context, we chose a DNA benchmark from the UCI repository (Dheeru & Karra Taniskidou, 2017), the Molecular Biology (Splice-junction Gene Sequences) Data Set.
Dataset Splits No The paper describes Monte-Carlo simulations and sampling for evaluation but does not specify explicit train/validation splits or percentages for model training.
Hardware Specification No The paper discusses computational complexity and running times for its methods but does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Python code' and reliance on 'ITE toolbox (Szab o, 2014)' but does not provide specific version numbers for Python or the ITE toolbox, which is required for reproducibility.
Experiment Setup Yes The errors are aggregates over 100 Monte-Carlo simulations, summarized in the median and quartile values. The number of samples (N) was chosen from {200, 400, . . . , 2000}. Q, the number of blocks in the MONK techniques, was equal to 5. The significance level was α = 0.05. To assess the variability of the results 400 Monte Carlo simulations were performed, each time uniformly sampling N points without replacement...