Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Single-Pass PCA of Large High-Dimensional Data

Authors: Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li

IJCAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with synthetic and real data validate the algorithm s accuracy, which has orders of magnitude smaller error than an existing single-pass algorithm.
Researcher Affiliation	Academia	Wenjian Yu, Yu Gu, Jian Li TNList, Dept. Computer Science & Tech., Tsinghua University, Beijing, China EMAIL; Shenghua Liu Inst. Computing Technology, Chinese Academy of Sciences, Beijing, China EMAIL; Yaohang Li Dept. Computer Science, Old Dominion University, Norfolk, VA 23529, USA EMAIL
Pseudocode	Yes	Algorithm 1 Basic randomized scheme for truncated SVD; Algorithm 2 An existing single-pass algorithm; Algorithm 3 A pass-efﬁcient blocked algorithm; Algorithm 4 A single-pass algorithm for computing PCA
Open Source Code	Yes	For reproducibility, we share the codes of the proposed algorithm and experimental data on https: //github.com/Wenjian Yu/r SVD-single-pass.
Open Datasets	Yes	Following [Halko et al., 2011a], we construct several large data using the unitary discrete cosine transform (command dct in Matlab). ... We apply the single-pass algorithm with k =50 to the matrix representing the images of faces from the FERET database [Phillips et al., 2000].
Dataset Splits	No	The paper describes the datasets used (synthetic, real, FERET) but does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility.
Hardware Specification	Yes	All experiments are carried out on a Linux server with two 12-core Intel Xeon E5-2630 CPUs (2.30 GHz), and 32 GB RAM.
Software Dependencies	No	The paper mentions software components like 'C with Open MP derivatives', 'MKL libraries', and 'LAPACK routines', but does not provide specific version numbers for these dependencies to ensure reproducible software environment.
Experiment Setup	Yes	In all experiments, the block size b = 10. ... The over-sampling parameter is set to 10 (i.e., l = 60). ... l in Algorithm 4 is set to 20 or 30.