Outlier Detection and Robust PCA Using a Convex Measure of Innovation
Authors: Mostafa Rahmani, Ping Li
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents a provable and strong algorithm, termed Innovation Search (i Search), to robust Principal Component Analysis (PCA) and outlier detection. [...] The theoretical and numerical results showed that finding the optimal directions makes i Search significantly robust to the outliers which carry weak innovation. Moreover, the experiments with real and synthetic data demonstrate the robustness of the proposed method against the strong presence of noise. |
| Researcher Affiliation | Industry | Mostafa Rahmani and Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, WA 98004, USA {mostafarahmani,liping11}@baidu.com |
| Pseudocode | Yes | Algorithm 1 Subspace Recovery Using i Search |
| Open Source Code | No | The paper does not provide any links to open-source code for the methodology, nor does it explicitly state that the code will be released or is available. |
| Open Datasets | Yes | We use the Hopkins155 dataset [33], which contains data matrices with 2 or 3 clusters. [...] We use the Waving Tree video file [21]. |
| Dataset Splits | No | The paper mentions synthetic data generation parameters and how a 'trial is considered successful', but it does not specify explicit dataset splits (e.g., train/validation/test percentages or counts) or cross-validation methods for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper states 'we use an ADMM solver to solve (1)' but does not provide specific version numbers for this solver or any other software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | In this experiment, M1 = 100, r = 5, and ni = 100. The data contains 300 unstructured and 10 structured outliers. The distribution of the structured outliers follow Assumption 2 with η = 0.1. [...] In addition, we identify column d as outlier if d ˆU ˆUd 2/ d 2 0.2 where ˆU is the recovered subspace. |