Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Best Subset Selection: Optimal Pursuit for Feature Selection and Elimination
Authors: Zhihan Zhu, Yanhao Zhang, Yong Xia
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on two typical subset selection problems: compressed sensing and sparse regression, to demonstrate the meta-gains achieved by the new algorithms developed through meta-substitution. ... Their performance is evaluated across multiple metrics, including the number of successful recoveries, NMSE, R2 , and runtime, highlighting the superiority of the enhanced algorithms from various perspectives. |
| Researcher Affiliation | Academia | 1School of Mathematical Sciences, Beihang University, Beijing, China (Email: EMAIL). |
| Pseudocode | Yes | Algorithm 1 OP: Optimal Pursuit Algorithm 2 Co Sa OP: Compressive Sampling Optimal Pursuit Algorithm 3 OP-BESS (Main Function): Optimal Pursuit-Best-Subset Selection Algorithm 4 OP-Splicing (Sub-Function of OP-BESS): Update Model Parameters and Support Sets |
| Open Source Code | Yes | Matlab codes are available at https://github.com/ZhihanZhu-math/Optimal_Pursuit_public. |
| Open Datasets | Yes | The data presented here is randomly sampled from the Audio Set dataset (Gemmeke et al., 2017). ... We utilize six real-world datasets in our experiments: (1) Boston Housing Data (Pedregosa et al., 2011), (2) California Housing Data (Pedregosa et al., 2011), (3) Superconductivity Data (Hamidieh, 2018), (4) House 16H (Vanschoren, 2014), (5) Prostate.v8.egen (Lin & Pan, 2024; Hastie et al., 2017), and (6) Spectra (The Math Works, Inc., 2025). |
| Dataset Splits | Yes | For each algorithm, we conduct 500 independent runs and record the number of successful recoveries as shown in Figure 3. ... We conducted 100 random experiments, and the results are summarized in Table 1. ... We evaluated the best subset selection (BSS) algorithm s predictive performance on the six datasets using 5-fold crossvalidation, where 4 folds were for training and 1 for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Matlab codes" but does not specify a version for Matlab or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | In this experiment, we randomly generate β with a dimensionality of p = 200 and a sparsity level of K = 10. ... The first row illustrates the variation in the number of successful recoveries for three groups of algorithms as the sampling rate increases from 25% to 50% under a fixed SNR of 15. The second row, in contrast, fixes the sampling rate at 25% and shows how the number of successful recoveries changes as the SNR increases from 15 to 25. ... In Algorithm 2, parameters like residual tolerance ϵ1, variation tolerance ϵ2, and maximum iteration count Maxiter are specified. |