Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now assess the numerical behavior of the algorithms proposed in Section 2 and 3. To bridge the gap between theory and practice, we present a Julia code which implements the described convex relaxation and greedy rounding procedure on Git Hub. In Section 4, we apply the cutting-plane and random rounding methods to derive optimal and near optimal sparse principal components for problems in the UCI data set. We also compare our method s performance against the method of Berk and Bertsimas (2019).
Researcher Affiliation	Academia	Dimitris Bertsimas EMAIL Sloan School of Management, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Ryan Cory-Wright EMAIL Operations Research Center, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Jean Pauphilet EMAIL London Business School Regent s Park, London, NW1 4SA, UK
Pseudocode	Yes	Algorithm 1 An outer-approximation method for Problem (1); Algorithm 2 A greedy rounding method for Problem (1); Algorithm 3 A Goemans-Williamson rounding method for Problem (1)
Open Source Code	Yes	To bridge the gap between theory and practice, we present a Julia code which implements the described convex relaxation and greedy rounding procedure on Git Hub.3 The code requires a conic solver such as Mosek and several open source Julia packages to be installed. 3. https://github.com/ryancorywright/Scalable SPCA.jl
Open Datasets	Yes	We compare our approach to the branch-and-bound algorithm4 developed by Berk and Bertsimas (2019) on the UCI pitprops, wine, miniboone, communities, arrythmia and micromass data sets... We also compare our method s performance on the Wilshire 5000, and Arcene UCI data sets. For the Gisette data set, we report on the methods performance when we include the ﬁrst 3, 000 and 4, 000 rows/columns (as well as all 5, 000 rows/columns).
Dataset Splits	No	The paper mentions applying the method for different sparsity parameters 'k {5, 10, 20}' and for '20 synthetic random instances'. However, it does not explicitly state any training, validation, or test dataset splits for the UCI datasets mentioned, nor does it specify how the synthetic instances were split, if at all, for model evaluation beyond generation.
Hardware Specification	Yes	All experiments were implemented in Julia 1.3, using Gurobi 9.1 and Ju MP.jl 0.21.6, and performed on a standard Macbook Pro laptop, with a 2.9GHz 6-Core Intel i9 CPU, using 16 GB DDR4 RAM. ... by running the method on one Intel Xeon E5 2690 v4 2.6GHz CPU core using 600 GB RAM.
Software Dependencies	Yes	All experiments were implemented in Julia 1.3, using Gurobi 9.1 and Ju MP.jl 0.21.6... SCS version 2.1.1 (with default parameters) instead of Mosek
Experiment Setup	Yes	We impose a relative optimality tolerance of 10 3 for all approaches, i.e., terminate each method when (UB LB)/UB 10 3 where UB denotes the current objective bound and LB denotes the current incumbent objective value. ... Additionally, we warm-start all methods with the solution from the method of Yuan and Zhang (2013), to maintain a fair comparison. ... We impose a time limit of 600s.