Single-Pass PCA of Large High-Dimensional Data
Authors: Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with synthetic and real data validate the algorithm s accuracy, which has orders of magnitude smaller error than an existing single-pass algorithm. |
| Researcher Affiliation | Academia | Wenjian Yu, Yu Gu, Jian Li TNList, Dept. Computer Science & Tech., Tsinghua University, Beijing, China yu-wj@tsinghua.edu.cn; Shenghua Liu Inst. Computing Technology, Chinese Academy of Sciences, Beijing, China liushenghua@ict.ac.cn; Yaohang Li Dept. Computer Science, Old Dominion University, Norfolk, VA 23529, USA yaohang@cs.odu.edu |
| Pseudocode | Yes | Algorithm 1 Basic randomized scheme for truncated SVD; Algorithm 2 An existing single-pass algorithm; Algorithm 3 A pass-efficient blocked algorithm; Algorithm 4 A single-pass algorithm for computing PCA |
| Open Source Code | Yes | For reproducibility, we share the codes of the proposed algorithm and experimental data on https: //github.com/Wenjian Yu/r SVD-single-pass. |
| Open Datasets | Yes | Following [Halko et al., 2011a], we construct several large data using the unitary discrete cosine transform (command dct in Matlab). ... We apply the single-pass algorithm with k =50 to the matrix representing the images of faces from the FERET database [Phillips et al., 2000]. |
| Dataset Splits | No | The paper describes the datasets used (synthetic, real, FERET) but does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility. |
| Hardware Specification | Yes | All experiments are carried out on a Linux server with two 12-core Intel Xeon E5-2630 CPUs (2.30 GHz), and 32 GB RAM. |
| Software Dependencies | No | The paper mentions software components like 'C with Open MP derivatives', 'MKL libraries', and 'LAPACK routines', but does not provide specific version numbers for these dependencies to ensure reproducible software environment. |
| Experiment Setup | Yes | In all experiments, the block size b = 10. ... The over-sampling parameter is set to 10 (i.e., l = 60). ... l in Algorithm 4 is set to 20 or 30. |