Robust covariance estimation with missing values and cell-wise contamination

Authors: Grégoire Pacreau, Karim Lounici

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To complement our theoretical findings, we conducted an experimental study which demonstrates the superiority of our approach over the state of the art both in low and high dimension settings.
Researcher Affiliation Academia Karim Lounici CMAP Ecole Polytechnique Palaiseau, France karim.lounici@polytechnique.edu Gregoire Pacreau CMAP Ecole Polytechnique Palaiseau, France gregoire.pacreau@polytechnique.edu
Pseudocode No The paper does not contain any pseudocode blocks or clearly labeled algorithm sections.
Open Source Code Yes Code available at https://github.com/klounici/COVARIANCE_contaminated_data
Open Datasets No The paper mentions synthetic data generation (App. A) and real-life datasets (App. B) but does not provide direct links, DOIs, or specific citations with author/year for public access to these datasets. For example, it lists "Boston from UCI [17]" and "Abalone from UCI [39]" but the citations [17] and [39] do not point to the datasets themselves or their public repositories with author/year.
Dataset Splits No The paper describes generating synthetic data and using real-life datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits with citations) for reproducibility.
Hardware Specification Yes All experiments were conducted on a 2020 Mac Book Air with a M1 processor (8 cores, 3.4 GHz).
Software Dependencies No The paper mentions software like "R packages cellwise and GSE", "sklearn [30]", and "hyperimpute [15]" but does not provide specific version numbers for these software components, which are necessary for reproducible dependency information.
Experiment Setup Yes The paper provides details on the experimental setup, including: "For n = 100, p = 50, r( ) = 2 under a Dirac contamination (tail MV and DDCMV are our methods). Here " = 1 and δ varies in (0, 1)" (Figure 1 caption). It also describes baselines used: "Our baselines are the empirical covariance estimator... and an oracle which knows the position of every outlier...". Additionally, it specifies contamination types (Dirac, Gaussian) and rates in various experiments.