reproducibilityindex.ai

Robust covariance estimation with missing values and cell-wise contamination

Authors: Grégoire Pacreau, Karim Lounici

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To complement our theoretical ﬁndings, we conducted an experimental study which demonstrates the superiority of our approach over the state of the art both in low and high dimension settings.
Researcher Affiliation	Academia	Karim Lounici CMAP Ecole Polytechnique Palaiseau, France karim.lounici@polytechnique.edu Gregoire Pacreau CMAP Ecole Polytechnique Palaiseau, France gregoire.pacreau@polytechnique.edu
Pseudocode	No	The paper does not contain any pseudocode blocks or clearly labeled algorithm sections.
Open Source Code	Yes	Code available at https://github.com/klounici/COVARIANCE_contaminated_data
Open Datasets	No	The paper mentions synthetic data generation (App. A) and real-life datasets (App. B) but does not provide direct links, DOIs, or specific citations with author/year for public access to these datasets. For example, it lists "Boston from UCI [17]" and "Abalone from UCI [39]" but the citations [17] and [39] do not point to the datasets themselves or their public repositories with author/year.
Dataset Splits	No	The paper describes generating synthetic data and using real-life datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits with citations) for reproducibility.
Hardware Specification	Yes	All experiments were conducted on a 2020 Mac Book Air with a M1 processor (8 cores, 3.4 GHz).
Software Dependencies	No	The paper mentions software like "R packages cellwise and GSE", "sklearn [30]", and "hyperimpute [15]" but does not provide specific version numbers for these software components, which are necessary for reproducible dependency information.
Experiment Setup	Yes	The paper provides details on the experimental setup, including: "For n = 100, p = 50, r( ) = 2 under a Dirac contamination (tail MV and DDCMV are our methods). Here " = 1 and δ varies in (0, 1)" (Figure 1 caption). It also describes baselines used: "Our baselines are the empirical covariance estimator... and an oracle which knows the position of every outlier...". Additionally, it specifies contamination types (Dirac, Gaussian) and rates in various experiments.