Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now assess the numerical behavior of the algorithms proposed in Section 2 and 3. To bridge the gap between theory and practice, we present a Julia code which implements the described convex relaxation and greedy rounding procedure on Git Hub. In Section 4, we apply the cutting-plane and random rounding methods to derive optimal and near optimal sparse principal components for problems in the UCI data set. We also compare our method s performance against the method of Berk and Bertsimas (2019). |
| Researcher Affiliation | Academia | Dimitris Bertsimas EMAIL Sloan School of Management, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Ryan Cory-Wright EMAIL Operations Research Center, Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Jean Pauphilet EMAIL London Business School Regent s Park, London, NW1 4SA, UK |
| Pseudocode | Yes | Algorithm 1 An outer-approximation method for Problem (1); Algorithm 2 A greedy rounding method for Problem (1); Algorithm 3 A Goemans-Williamson rounding method for Problem (1) |
| Open Source Code | Yes | To bridge the gap between theory and practice, we present a Julia code which implements the described convex relaxation and greedy rounding procedure on Git Hub.3 The code requires a conic solver such as Mosek and several open source Julia packages to be installed. 3. https://github.com/ryancorywright/Scalable SPCA.jl |
| Open Datasets | Yes | We compare our approach to the branch-and-bound algorithm4 developed by Berk and Bertsimas (2019) on the UCI pitprops, wine, miniboone, communities, arrythmia and micromass data sets... We also compare our method s performance on the Wilshire 5000, and Arcene UCI data sets. For the Gisette data set, we report on the methods performance when we include the ο¬rst 3, 000 and 4, 000 rows/columns (as well as all 5, 000 rows/columns). |
| Dataset Splits | No | The paper mentions applying the method for different sparsity parameters 'k {5, 10, 20}' and for '20 synthetic random instances'. However, it does not explicitly state any training, validation, or test dataset splits for the UCI datasets mentioned, nor does it specify how the synthetic instances were split, if at all, for model evaluation beyond generation. |
| Hardware Specification | Yes | All experiments were implemented in Julia 1.3, using Gurobi 9.1 and Ju MP.jl 0.21.6, and performed on a standard Macbook Pro laptop, with a 2.9GHz 6-Core Intel i9 CPU, using 16 GB DDR4 RAM. ... by running the method on one Intel Xeon E5 2690 v4 2.6GHz CPU core using 600 GB RAM. |
| Software Dependencies | Yes | All experiments were implemented in Julia 1.3, using Gurobi 9.1 and Ju MP.jl 0.21.6... SCS version 2.1.1 (with default parameters) instead of Mosek |
| Experiment Setup | Yes | We impose a relative optimality tolerance of 10 3 for all approaches, i.e., terminate each method when (UB LB)/UB 10 3 where UB denotes the current objective bound and LB denotes the current incumbent objective value. ... Additionally, we warm-start all methods with the solution from the method of Yuan and Zhang (2013), to maintain a fair comparison. ... We impose a time limit of 600s. |