A Unified Framework for Outlier-Robust PCA-like Algorithms
Authors: Wenzhuo Yang, Huan Xu
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on synthetic and realworld datasets demonstrate that the outlier-robust PCA-like algorithms derived from our framework have outstanding performance. |
| Researcher Affiliation | Academia | Wenzhuo Yang A0096049@NUS.EDU.SG Department of Mechanical Engineering, National University of Singapore, Singapore 117576 Huan Xu MPEXUH@NUS.EDU.SG Department of Mechanical Engineering, National University of Singapore, Singapore 117576 |
| Pseudocode | Yes | Algorithm 1 Outlier-Robust PCA-like Algorithm Input: Contaminated sample-set Y = {y1, , yn}, k, T, ˆt, µ. Procedure: 1) Initialize: s = 0, Opt = 0; ˆyi = yi and αi = 1 for i = 1, , n; while s T do 2) Compute the weighted empirical covariance matrix ˆΣ = 1 n n i=1 αiˆyiˆy i ; 3) Solve the PCA-like problem 1 and denote the output by ˆX; 4) If V ˆt( ˆX) > Opt, let Opt = V ˆt( ˆX) and X = ˆX, where V ˆt( ˆX) 1 ˆt ˆt i=1 yy , ˆX (i); 5) Update αi = (1 yiy i , ˆX max{i|αi =0} yiy i , ˆX )αi; 6) s = s + 1; end while 7) Perform SVD on X and denote the top k eigenvectors by w 1, , w k; 8) return w 1, , w k and X . |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code for their method or a link to a code repository. |
| Open Datasets | Yes | In the third experiment, we show the performance of ORSPCA, OR-PCA and FPS on a real dataset of 600 samples in which 75% of samples are drawn from MNIST (Le Cun et al., 1995) and 25% of samples are drawn from the CBCL face image dataset (Sung, 1996). ... Finally, we use the NYTimes news article dataset from the UCI Machine Learning Repository (Frank & Asuncion, 2010), which contains 300000 articles and a dictionary of 102660 unique words... |
| Dataset Splits | No | The paper describes the composition of datasets used in experiments (e.g., 75% MNIST and 25% CBCL for one experiment, or mixing 2429 CBCL images with 125 MNIST images), but it does not specify explicit train/validation/test splits with percentages, sample counts, or methods for data partitioning to reproduce the experiment. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cluster specifications. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers that would be necessary to replicate the experiments. |
| Experiment Setup | Yes | The parameters for generating test data are set as follows: d = 10, σ = 0.05, β = 0.3p. Parameter T and ˆt for OR-PCA and OR-SPCA are set to 10 and ρn, respectively. Parameter µ for FPS and OR-SPCA is 0.2√n. |