Robust covariance estimation with missing values and cell-wise contamination
Authors: Grégoire Pacreau, Karim Lounici
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To complement our theoretical findings, we conducted an experimental study which demonstrates the superiority of our approach over the state of the art both in low and high dimension settings. |
| Researcher Affiliation | Academia | Karim Lounici CMAP Ecole Polytechnique Palaiseau, France karim.lounici@polytechnique.edu Gregoire Pacreau CMAP Ecole Polytechnique Palaiseau, France gregoire.pacreau@polytechnique.edu |
| Pseudocode | No | The paper does not contain any pseudocode blocks or clearly labeled algorithm sections. |
| Open Source Code | Yes | Code available at https://github.com/klounici/COVARIANCE_contaminated_data |
| Open Datasets | No | The paper mentions synthetic data generation (App. A) and real-life datasets (App. B) but does not provide direct links, DOIs, or specific citations with author/year for public access to these datasets. For example, it lists "Boston from UCI [17]" and "Abalone from UCI [39]" but the citations [17] and [39] do not point to the datasets themselves or their public repositories with author/year. |
| Dataset Splits | No | The paper describes generating synthetic data and using real-life datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits with citations) for reproducibility. |
| Hardware Specification | Yes | All experiments were conducted on a 2020 Mac Book Air with a M1 processor (8 cores, 3.4 GHz). |
| Software Dependencies | No | The paper mentions software like "R packages cellwise and GSE", "sklearn [30]", and "hyperimpute [15]" but does not provide specific version numbers for these software components, which are necessary for reproducible dependency information. |
| Experiment Setup | Yes | The paper provides details on the experimental setup, including: "For n = 100, p = 50, r( ) = 2 under a Dirac contamination (tail MV and DDCMV are our methods). Here " = 1 and δ varies in (0, 1)" (Figure 1 caption). It also describes baselines used: "Our baselines are the empirical covariance estimator... and an oracle which knows the position of every outlier...". Additionally, it specifies contamination types (Dirac, Gaussian) and rates in various experiments. |