MONK Outlier-Robust Mean Embedding Estimation by Median-of-Means
Authors: Matthieu Lerasle, Zoltan Szabo, Timothée Mathieu, Guillaume Lecue
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Implementation of the MONK estimators is the focus of Section 4, with numerical illustrations in Section 5. In this section, we demonstrate the performance of the proposed MONK estimators. |
| Researcher Affiliation | Academia | 1Laboratoire de Math ematiques d Orsay, Univ. Paris Sud, France 2CNRS, Universit e Paris Saclay, France 3CMAP, Ecole Polytechnique, Palaiseau, France 4CREST ENSAE Paris Tech, France. |
| Pseudocode | Yes | Algorithm 1 MONK BCD estimator for MMD. Algorithm 2 MONK BCD-Fast estimator for MMD. |
| Open Source Code | Yes | The Python code reproducing our numerical experiments is available at https://bitbucket.org/ Timothee Mathieu/monk-mmd; it relies on the ITE toolbox (Szab o, 2014). |
| Open Datasets | Yes | In order to demonstrate the applicability of our estimators in biological context, we chose a DNA benchmark from the UCI repository (Dheeru & Karra Taniskidou, 2017), the Molecular Biology (Splice-junction Gene Sequences) Data Set. |
| Dataset Splits | No | The paper describes Monte-Carlo simulations and sampling for evaluation but does not specify explicit train/validation splits or percentages for model training. |
| Hardware Specification | No | The paper discusses computational complexity and running times for its methods but does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Python code' and reliance on 'ITE toolbox (Szab o, 2014)' but does not provide specific version numbers for Python or the ITE toolbox, which is required for reproducibility. |
| Experiment Setup | Yes | The errors are aggregates over 100 Monte-Carlo simulations, summarized in the median and quartile values. The number of samples (N) was chosen from {200, 400, . . . , 2000}. Q, the number of blocks in the MONK techniques, was equal to 5. The significance level was α = 0.05. To assess the variability of the results 400 Monte Carlo simulations were performed, each time uniformly sampling N points without replacement... |