reproducibilityindex.ai

A Unified Framework for Outlier-Robust PCA-like Algorithms

Authors: Wenzhuo Yang, Huan Xu

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on synthetic and realworld datasets demonstrate that the outlier-robust PCA-like algorithms derived from our framework have outstanding performance.
Researcher Affiliation	Academia	Wenzhuo Yang A0096049@NUS.EDU.SG Department of Mechanical Engineering, National University of Singapore, Singapore 117576 Huan Xu MPEXUH@NUS.EDU.SG Department of Mechanical Engineering, National University of Singapore, Singapore 117576
Pseudocode	Yes	Algorithm 1 Outlier-Robust PCA-like Algorithm Input: Contaminated sample-set Y = {y1, , yn}, k, T, ˆt, µ. Procedure: 1) Initialize: s = 0, Opt = 0; ˆyi = yi and αi = 1 for i = 1, , n; while s T do 2) Compute the weighted empirical covariance matrix ˆΣ = 1 n n i=1 αiˆyiˆy i ; 3) Solve the PCA-like problem 1 and denote the output by ˆX; 4) If V ˆt( ˆX) > Opt, let Opt = V ˆt( ˆX) and X = ˆX, where V ˆt( ˆX) 1 ˆt ˆt i=1 yy , ˆX (i); 5) Update αi = (1 yiy i , ˆX max{i\|αi =0} yiy i , ˆX )αi; 6) s = s + 1; end while 7) Perform SVD on X and denote the top k eigenvectors by w 1, , w k; 8) return w 1, , w k and X .
Open Source Code	No	The paper does not provide an explicit statement about releasing code for their method or a link to a code repository.
Open Datasets	Yes	In the third experiment, we show the performance of ORSPCA, OR-PCA and FPS on a real dataset of 600 samples in which 75% of samples are drawn from MNIST (Le Cun et al., 1995) and 25% of samples are drawn from the CBCL face image dataset (Sung, 1996). ... Finally, we use the NYTimes news article dataset from the UCI Machine Learning Repository (Frank & Asuncion, 2010), which contains 300000 articles and a dictionary of 102660 unique words...
Dataset Splits	No	The paper describes the composition of datasets used in experiments (e.g., 75% MNIST and 25% CBCL for one experiment, or mixing 2429 CBCL images with 125 MNIST images), but it does not specify explicit train/validation/test splits with percentages, sample counts, or methods for data partitioning to reproduce the experiment.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cluster specifications.
Software Dependencies	No	The paper does not provide specific software names with version numbers that would be necessary to replicate the experiments.
Experiment Setup	Yes	The parameters for generating test data are set as follows: d = 10, σ = 0.05, β = 0.3p. Parameter T and ˆt for OR-PCA and OR-SPCA are set to 10 and ρn, respectively. Parameter µ for FPS and OR-SPCA is 0.2√n.