Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Streaming Sparse Principal Component Analysis
Authors: Wenzhuo Yang, Huan Xu
ICML 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on synthetic and realworld datasets demonstrate good empirical performance of the proposed algorithms. We investigate the performance of our algorithms on a variety of simulated and real-world datasets. |
| Researcher Affiliation | Academia | Wenzhuo Yang EMAIL Department of Mechanical Engineering, National University of Singapore, Singapore 117576. Huan Xu EMAIL Department of Mechanical Engineering, National University of Singapore, Singapore 117576. |
| Pseudocode | Yes | Algorithm 1 Row Truncation Operator. Algorithm 2 Streaming SPCA via Row Truncation. Algorithm 3 Streaming SPCA via Iterative Deflation. Algorithm 4 Streaming ECA via Row Truncation. Algorithm 5 Finding Initial Solution. |
| Open Source Code | No | The paper does not provide any explicit statement or link for the open-sourcing of the described methodology's code. |
| Open Datasets | Yes | We use two large datasets, the NIPS paper dataset and the NYTimes news articles dataset, both available from the UCI Machine Learning Repository (Bache & Lichman). |
| Dataset Splits | No | The paper describes generating synthetic data and using real-world datasets with parameters like block size (B) and total samples (n). However, it does not explicitly provide information on standard train/validation/test splits, percentages, or sample counts needed for data partitioning. |
| Hardware Specification | Yes | The experiments are conducted on a desktop PC with an i7 3.4GHz CPU and 4G memory. |
| Software Dependencies | No | All the algorithms mentioned below are implemented in Python. This statement mentions the programming language but does not specify a version number or any other software libraries with their respective versions. |
| Experiment Setup | Yes | Parameters B and γ in streaming sparse PCA are set to 300 and 500, respectively. In the following experiments, the samples are independently drawn from ECp(0, Σ, ξ). Here, Σ is constructed according to Σ = AA + Ip where A is generated following the first scheme described above, and ξ follows the chi-distribution with degree of freedom p or the F-distribution with degrees of freedom p and 1. |