Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unlabeled Principal Component Analysis and Matrix Completion

Authors: Yunzhen Yao, Liangzu Peng, Manolis C. Tsakiris

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experimental Evaluation Here we perform synthetic and real data experiments to evaluate the proposed algorithmic pipelines for UPCA (Section 4.1) and UMC (Section 4.2). We use two metrics for performance evaluation. The first is the largest principal angle θmax(S , S) between the estimated subspace S and ground-truth S , and this is used for Stage-I to evaluate subspace learning accuracy. The second metric is the relative estimation error X X F X F between the estimated data matrix X and the ground-truth X , which quantifies the final performance of our algorithmic pipeline. For both metrics, smaller values imply better performance.
Researcher Affiliation	Academia	Yunzhen Yao EMAIL School of Computer and Communication Sciences EPFL CH-1015 Lausanne, Switzerland Liangzu Peng EMAIL Manolis C. Tsakiris EMAIL Key Laboratory for Mathematics Mechanization Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China
Pseudocode	Yes	Algorithm 1 Two-stage Algorithmic Pipeline for UPCA Algorithm 2 Unlabeled Sensing via Least-Squares with Recursive Filtration (LSRF) Algorithm 3 Two-stage Algorithmic Pipeline for UMC
Open Source Code	No	The text is ambiguous or lacks a clear, affirmative statement of release. No specific link to source code or statement of its availability for the methodology described in this paper is provided.
Open Datasets	Yes	We use the well-known database Extended Yale B (Georghiades et al., 2001) The second data set consists of all the benign cases in Breast Cancer Wisconsin (Diagnostic) (Asuncion and Newman, 2007).
Dataset Splits	No	The paper refers to datasets without mentioning splits (e.g., 'we use the XYZ dataset') and does not explicitly provide training/test/validation splits for its experiments. The descriptions of data partitioning relate to creating outliers for the experimental conditions, not to standard dataset splits for model evaluation.
Hardware Specification	Yes	Experiments are run on an Intel(R) i7-8700K, 3.7 GHz, 16GB machine.
Software Dependencies	No	The paper mentions specific software tools such as 'Bertini (Bates et al.)' and 'the toolbox of Beck and Guttmann-Beck (2019)' but does not provide explicit version numbers for these or any other key software components used in the experiments.
Experiment Setup	Yes	For Self-Expr we use λ = 0.95, α = 10 and T = 1000, see section 5 in You et al. (2017). For DPCP we use Tmax = 1000, ϵ = 10 9 and δ = 10 15, see Algorithm 2 in Tsakiris and Vidal (2018b). Finally, OP uses λ = 0.5 and τ = 1 in Algorithm 1 of Xu et al. (2012). For AIEM we use a maximum number of 1000 iterations in the alternating minimization of (8). For CCV-Min we use a precision of 0.001, the maximal number of iterations is set to 50, and the maximum depth to 12 for r = 3 and 14 for r = 4, 5.