Screening rules for Lasso with non-convex Sparse Regularizers
Authors: Alain Rakotomamonjy, Gilles Gasso, Joseph Salmon
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental analysis illustrates the significant computational gain brought by the new screening rule compared to classical coordinate-descent or proximal gradient descent methods. |
| Researcher Affiliation | Collaboration | 1LITIS, Normandy University, University of Rouen Normandie, INSA Rouen Normadie, France 2Criteo AI Lab, Paris, France 3IMAG, Univ Montpellier, CNRS, Montpellier, France. |
| Pseudocode | Yes | Algorithm 1 Proximal Weighted Lasso (PWL) and Algorithm 2 MM algorithm for Lasso with penalty rλ(|w|) |
| Open Source Code | Yes | Additional details are provided in the supplementary material and the Python code for the experiment is available at https://github.com/arakotom/screening_ncvx_penalty |
| Open Datasets | Yes | We have run comparisons on different real datasets. We have reported on Figure 3 (left) the case for the leukemia dataset (dense data with n = 72 and d = 7129). On Figure 3 (right), we compare the gain in running time brought by the screening propagation rule on the newsgroups dataset in which we have kept only 2 categories (religion and graphics) resulting in n = 961, d = 21319. |
| Dataset Splits | No | The paper defines stopping conditions based on 'tolerance' for convergence and mentions generating data for a 'toy problem', but does not explicitly provide details on how the datasets are split into training, validation, and test sets, nor does it specify any cross-validation setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory amounts, or specific computing environments) used for running the experiments. |
| Software Dependencies | No | The paper states 'Note that all algorithms have been implemented in Python/Numpy' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | All algorithms have been stopped when optimality conditions as described in Equation (4) are satisfied up to the same tolerance τ. For our MM approaches, we stop the inner solver of the Proximal Weighted Lasso when the duality gap is smaller than 10 4 and screening is computed every 5 iterations. In the outer iterations, we perform the screening every 10 iterations. In practice, α > 0 helps us guaranteeing theoretical convergence of the sequence {wk} and we have set α = 109 for all our experiments, leading to very small regularization allowing large deviations from wk. For our experiments, we have used the log-sum penalty which has an hyperparameter θ. Hence, our regularization path involves 2 parameters (λt,θt). The set of {λt}Nt 1 t=0 has been defined as λt λmax10 3t Nt 1 , Nt being dependent of the problems and θ {0.01, 0.1, 1}. The entries of the regression design matrix X Rn d are drawn uniformly from a Gaussian distribution of zero mean and variance 4. For given n, d and a number p of active variables, the true coefficient vector w is obtained as follows. The p non-zero positions are chosen randomly, and their values are drawn from a zero-mean unit variance Gaussian distribution, to which we added 0.1 according sign(w j ). Finally, the target vector is obtained as y = Xw + e where e is a random noise vector drawn from a Gaussian distribution with zero-mean and standard deviation σ. For the toy problem, we have set Nt = 50. |