A Unified Framework for Outlier-Robust PCA-like Algorithms

Authors: Wenzhuo Yang, Huan Xu

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on synthetic and realworld datasets demonstrate that the outlier-robust PCA-like algorithms derived from our framework have outstanding performance.
Researcher Affiliation Academia Wenzhuo Yang A0096049@NUS.EDU.SG Department of Mechanical Engineering, National University of Singapore, Singapore 117576 Huan Xu MPEXUH@NUS.EDU.SG Department of Mechanical Engineering, National University of Singapore, Singapore 117576
Pseudocode Yes Algorithm 1 Outlier-Robust PCA-like Algorithm Input: Contaminated sample-set Y = {y1, , yn}, k, T, ˆt, µ. Procedure: 1) Initialize: s = 0, Opt = 0; ˆyi = yi and αi = 1 for i = 1, , n; while s T do 2) Compute the weighted empirical covariance matrix ˆΣ = 1 n n i=1 αiˆyiˆy i ; 3) Solve the PCA-like problem 1 and denote the output by ˆX; 4) If V ˆt( ˆX) > Opt, let Opt = V ˆt( ˆX) and X = ˆX, where V ˆt( ˆX) 1 ˆt ˆt i=1 yy , ˆX (i); 5) Update αi = (1 yiy i , ˆX max{i|αi =0} yiy i , ˆX )αi; 6) s = s + 1; end while 7) Perform SVD on X and denote the top k eigenvectors by w 1, , w k; 8) return w 1, , w k and X .
Open Source Code No The paper does not provide an explicit statement about releasing code for their method or a link to a code repository.
Open Datasets Yes In the third experiment, we show the performance of ORSPCA, OR-PCA and FPS on a real dataset of 600 samples in which 75% of samples are drawn from MNIST (Le Cun et al., 1995) and 25% of samples are drawn from the CBCL face image dataset (Sung, 1996). ... Finally, we use the NYTimes news article dataset from the UCI Machine Learning Repository (Frank & Asuncion, 2010), which contains 300000 articles and a dictionary of 102660 unique words...
Dataset Splits No The paper describes the composition of datasets used in experiments (e.g., 75% MNIST and 25% CBCL for one experiment, or mixing 2429 CBCL images with 125 MNIST images), but it does not specify explicit train/validation/test splits with percentages, sample counts, or methods for data partitioning to reproduce the experiment.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cluster specifications.
Software Dependencies No The paper does not provide specific software names with version numbers that would be necessary to replicate the experiments.
Experiment Setup Yes The parameters for generating test data are set as follows: d = 10, σ = 0.05, β = 0.3p. Parameter T and ˆt for OR-PCA and OR-SPCA are set to 10 and ρn, respectively. Parameter µ for FPS and OR-SPCA is 0.2√n.