Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Expanding Boundaries of Gap Safe Screening
Authors: Cassio F. Dantas, Emmanuel Soubies, Cédric Févotte
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we showcase the effectiveness of the proposed screening rules with different solvers (coordinate descent, multiplicative-update and proximal gradient algorithms) and different data sets (binary classification, hyperspectral and count data). |
| Researcher Affiliation | Academia | Cassio F. Dantas EMAIL Emmanuel Soubies EMAIL Cédric Févotte EMAIL IRIT, Université de Toulouse, CNRS, Toulouse, France |
| Pseudocode | Yes | Algorithm 1 Dynamic Gap Safe Screening (DGS) (Ndiaye et al., 2017): ˆx=GAPSolver(A, λ, εgap) Algorithm 2 Generalized Dynamic Gap Safe Screening (G-DGS): ˆx = Gap Solver(A, λ, S0, εgap) Algorithm 3 Refined Dynamic Gap Safe Screening (R-DGS) : ˆx = GAPSolver(A, λ, S0, εgap, εr) |
| Open Source Code | Yes | The complete Matlab code is made available by the authors2 for the sake of reproducibility. 2. Code available at: https://github.com/cassiofragadantas/KL_screening |
| Open Datasets | Yes | The Leukemia binary classification data set (Golub et al., 1999)3 is used for the logistic regression case. For the KL case, the NIPS papers word count data set (Globerson et al., 2007)4 is considered. For the β-divergence case, we use the Urban hyperspectral image data set (Jia and Qian, 2007)5, since the β-divergence (especially with β = 1.5) was reported to be well-suited for this particular type of data (Févotte and Dobigeon, 2015). 3. Data set available at LIBSVM: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ 4. Data set available at: http://ai.stanford.edu/~gal/data.html 5. Data set available at: https://rslab.ut.ac.ir/data |
| Dataset Splits | No | The paper describes how input signals/vectors are formed from the datasets (e.g., 'a randomly-selected pixel from the image', 'a randomly selected column of the data matrix') but does not specify explicit training, validation, or test splits in terms of percentages, counts, or references to predefined splits. It also mentions 'Sample number 17 was removed from the original Leukemia data set' but not how the remaining data is split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models, processor types, or memory. |
| Software Dependencies | No | The paper mentions that the code is implemented in Matlab but does not specify a version number for Matlab or any other key software libraries or dependencies. It lists solvers like 'coordinate descent', 'multiplicative update', and 'proximal gradient algorithms' but without specific versions or implementations. |
| Experiment Setup | Yes | We set the smoothing parameter of β1.5 and KL divergence to ϵ = 10 6. Remaining problem parameters are fixed by the choice of the data set: problem dimensions (m, n) and data distribution (both the input vector y and matrix A). This section is organized as follows: the particular cases of logistic regression, β1.5divergence and Kullback-Leibler divergence are treated respectively in Sections 5.1 to 5.3. Parameter Range Regularization (λ/λmax) [10 3, 1] Stopping criterion (εgap) {10 7, 10 5}. |