Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Unlabeled Principal Component Analysis and Matrix Completion
Authors: Yunzhen Yao, Liangzu Peng, Manolis C. Tsakiris
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experimental Evaluation Here we perform synthetic and real data experiments to evaluate the proposed algorithmic pipelines for UPCA (Section 4.1) and UMC (Section 4.2). We use two metrics for performance evaluation. The first is the largest principal angle θmax(S , S) between the estimated subspace S and ground-truth S , and this is used for Stage-I to evaluate subspace learning accuracy. The second metric is the relative estimation error X X F X F between the estimated data matrix X and the ground-truth X , which quantifies the final performance of our algorithmic pipeline. For both metrics, smaller values imply better performance. |
| Researcher Affiliation | Academia | Yunzhen Yao EMAIL School of Computer and Communication Sciences EPFL CH-1015 Lausanne, Switzerland Liangzu Peng EMAIL Manolis C. Tsakiris EMAIL Key Laboratory for Mathematics Mechanization Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China |
| Pseudocode | Yes | Algorithm 1 Two-stage Algorithmic Pipeline for UPCA Algorithm 2 Unlabeled Sensing via Least-Squares with Recursive Filtration (LSRF) Algorithm 3 Two-stage Algorithmic Pipeline for UMC |
| Open Source Code | No | The text is ambiguous or lacks a clear, affirmative statement of release. No specific link to source code or statement of its availability for the methodology described in this paper is provided. |
| Open Datasets | Yes | We use the well-known database Extended Yale B (Georghiades et al., 2001) The second data set consists of all the benign cases in Breast Cancer Wisconsin (Diagnostic) (Asuncion and Newman, 2007). |
| Dataset Splits | No | The paper refers to datasets without mentioning splits (e.g., 'we use the XYZ dataset') and does not explicitly provide training/test/validation splits for its experiments. The descriptions of data partitioning relate to creating outliers for the experimental conditions, not to standard dataset splits for model evaluation. |
| Hardware Specification | Yes | Experiments are run on an Intel(R) i7-8700K, 3.7 GHz, 16GB machine. |
| Software Dependencies | No | The paper mentions specific software tools such as 'Bertini (Bates et al.)' and 'the toolbox of Beck and Guttmann-Beck (2019)' but does not provide explicit version numbers for these or any other key software components used in the experiments. |
| Experiment Setup | Yes | For Self-Expr we use λ = 0.95, α = 10 and T = 1000, see section 5 in You et al. (2017). For DPCP we use Tmax = 1000, ϵ = 10 9 and δ = 10 15, see Algorithm 2 in Tsakiris and Vidal (2018b). Finally, OP uses λ = 0.5 and τ = 1 in Algorithm 1 of Xu et al. (2012). For AIEM we use a maximum number of 1000 iterations in the alternating minimization of (8). For CCV-Min we use a precision of 0.001, the maximal number of iterations is set to 50, and the maximum depth to 12 for r = 3 and 14 for r = 4, 5. |