Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Weighted L1 and L0 Regularization Using Proximal Operator Splitting Methods
Authors: Zewude A. Berkessa, Patrik Waldmann
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Moreover, we evaluate the effectiveness of our model on both simulated and real high-dimensional genomic datasets by comparing with adaptive versions of the least absolute shrinkage and selection operator (LASSO), elastic net (EN), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). The results show that WL1L0 outperforms the LASSO, EN, SCAD and MCP by consistently achieving the lowest mean squared error (MSE) across all datasets, indicating its superior ability to handling large high-dimensional data. |
| Researcher Affiliation | Academia | Zewude A. Berkessa EMAIL Research Unit of Mathematical Sciences University of Oulu, Patrik Waldmann EMAIL Research Unit of Mathematical Sciences University of Oulu |
| Pseudocode | Yes | Hence, for WL1L0-ADMM, the updates are made in six steps, alternating between the two primal variables u and v, with corresponding dual variables m and w. The steps are: c(k+1) := prox Tv(u)γ(u(k) m(k)), u(k+1) := proxgγ(c(k+1) + m(k)), m(k+1) := m(k) + c(k+1) u(k+1), d(k+1) := prox Tu(v)δ(v(k) w(k)), v(k+1) := proxhδ(d(k+1) + w(k)), w(k+1) := w(k) + d(k+1) v(k+1). |
| Open Source Code | Yes | Julia code for the WL1L0-ADMM and WL1L0-SCPRSM is available at https: //github.com/Zew AB/WL1L0-ADMM-and-SCPRSM. |
| Open Datasets | Yes | Simulated QTLMAS 2010 Dataset (Szydlowski & Paczyńska, 2011): This dataset comprises 3226 individuals... Real Pig Dataset (Cleveland et al., 2012): This dataset contains genomic SNP data from 3534 individuals... Real Mice Dataset (Pérez & de Los Campos, 2014): This dataset contains data from 1814 individuals... |
| Dataset Splits | Yes | Generations 1 to 4 (individuals 1 to 2326) were used for training, and generation 5 (individuals 2327 to 3226) served as test data. [...] For the Pig dataset, we employed 5-fold cross-validation with random allocations into training and test data to obtain the minimum test MSE on the test data set, with the results averaged over the folds. [...] Similar to the Pig dataset, we employed 5-fold cross-validation also for this data. |
| Hardware Specification | Yes | All analyses were executed on a Linux computing platform equipped with an AMD EPYC 7302P 16-Core Processor and 32GB of system memory. |
| Software Dependencies | Yes | The WL1L0-ADMM, WL1L0-SCPRSM, EN-ADMM, EN-SCPRSM, LASSO-ADMM and LASSO-SCPRSM methods were implemented in Julia 1.10.1 (Bezanson et al., 2017) using the Proximal Operators package (Antonello et al., 2018). For all methods, the BO was performed with the Bayesian Optimization package using an Elastic GPE model and the squared exponential automatic relevance determination (SEArd) kernel (Fairbrother et al., 2018). |
| Experiment Setup | Yes | The initial values of ˆb, ˆc and ˆd were set to the marginal covariances between y and X, multiplied by 0.0001. By conducting preliminary runs for each set of hyperparameters using BO, we identified the optimal range of parameters. BO with the MI acquisition function was executed for hyperparameter tuning of all methods. The test MSE was monitored during the BO process to ensure convergence, which was indicated by no further decrease in MSE. [...] The iterations are terminated when convergence is reached according to (c(k) + d(k)) (u(k) + v(k)) β(1 + m(k) + w(k)) ) for tolerance parameter β which was set to 10 5. |