Single-Pass PCA of Large High-Dimensional Data

Authors: Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with synthetic and real data validate the algorithm s accuracy, which has orders of magnitude smaller error than an existing single-pass algorithm.
Researcher Affiliation Academia Wenjian Yu, Yu Gu, Jian Li TNList, Dept. Computer Science & Tech., Tsinghua University, Beijing, China yu-wj@tsinghua.edu.cn; Shenghua Liu Inst. Computing Technology, Chinese Academy of Sciences, Beijing, China liushenghua@ict.ac.cn; Yaohang Li Dept. Computer Science, Old Dominion University, Norfolk, VA 23529, USA yaohang@cs.odu.edu
Pseudocode Yes Algorithm 1 Basic randomized scheme for truncated SVD; Algorithm 2 An existing single-pass algorithm; Algorithm 3 A pass-efficient blocked algorithm; Algorithm 4 A single-pass algorithm for computing PCA
Open Source Code Yes For reproducibility, we share the codes of the proposed algorithm and experimental data on https: //github.com/Wenjian Yu/r SVD-single-pass.
Open Datasets Yes Following [Halko et al., 2011a], we construct several large data using the unitary discrete cosine transform (command dct in Matlab). ... We apply the single-pass algorithm with k =50 to the matrix representing the images of faces from the FERET database [Phillips et al., 2000].
Dataset Splits No The paper describes the datasets used (synthetic, real, FERET) but does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility.
Hardware Specification Yes All experiments are carried out on a Linux server with two 12-core Intel Xeon E5-2630 CPUs (2.30 GHz), and 32 GB RAM.
Software Dependencies No The paper mentions software components like 'C with Open MP derivatives', 'MKL libraries', and 'LAPACK routines', but does not provide specific version numbers for these dependencies to ensure reproducible software environment.
Experiment Setup Yes In all experiments, the block size b = 10. ... The over-sampling parameter is set to 10 (i.e., l = 60). ... l in Algorithm 4 is set to 20 or 30.