Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Expanding Boundaries of Gap Safe Screening

Authors: Cassio F. Dantas, Emmanuel Soubies, Cédric Févotte

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we showcase the eﬀectiveness of the proposed screening rules with diﬀerent solvers (coordinate descent, multiplicative-update and proximal gradient algorithms) and diﬀerent data sets (binary classiﬁcation, hyperspectral and count data).
Researcher Affiliation	Academia	Cassio F. Dantas EMAIL Emmanuel Soubies EMAIL Cédric Févotte EMAIL IRIT, Université de Toulouse, CNRS, Toulouse, France
Pseudocode	Yes	Algorithm 1 Dynamic Gap Safe Screening (DGS) (Ndiaye et al., 2017): ˆx=GAPSolver(A, λ, εgap) Algorithm 2 Generalized Dynamic Gap Safe Screening (G-DGS): ˆx = Gap Solver(A, λ, S0, εgap) Algorithm 3 Reﬁned Dynamic Gap Safe Screening (R-DGS) : ˆx = GAPSolver(A, λ, S0, εgap, εr)
Open Source Code	Yes	The complete Matlab code is made available by the authors2 for the sake of reproducibility. 2. Code available at: https://github.com/cassiofragadantas/KL_screening
Open Datasets	Yes	The Leukemia binary classiﬁcation data set (Golub et al., 1999)3 is used for the logistic regression case. For the KL case, the NIPS papers word count data set (Globerson et al., 2007)4 is considered. For the β-divergence case, we use the Urban hyperspectral image data set (Jia and Qian, 2007)5, since the β-divergence (especially with β = 1.5) was reported to be well-suited for this particular type of data (Févotte and Dobigeon, 2015). 3. Data set available at LIBSVM: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ 4. Data set available at: http://ai.stanford.edu/~gal/data.html 5. Data set available at: https://rslab.ut.ac.ir/data
Dataset Splits	No	The paper describes how input signals/vectors are formed from the datasets (e.g., 'a randomly-selected pixel from the image', 'a randomly selected column of the data matrix') but does not specify explicit training, validation, or test splits in terms of percentages, counts, or references to predefined splits. It also mentions 'Sample number 17 was removed from the original Leukemia data set' but not how the remaining data is split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models, processor types, or memory.
Software Dependencies	No	The paper mentions that the code is implemented in Matlab but does not specify a version number for Matlab or any other key software libraries or dependencies. It lists solvers like 'coordinate descent', 'multiplicative update', and 'proximal gradient algorithms' but without specific versions or implementations.
Experiment Setup	Yes	We set the smoothing parameter of β1.5 and KL divergence to ϵ = 10 6. Remaining problem parameters are ﬁxed by the choice of the data set: problem dimensions (m, n) and data distribution (both the input vector y and matrix A). This section is organized as follows: the particular cases of logistic regression, β1.5divergence and Kullback-Leibler divergence are treated respectively in Sections 5.1 to 5.3. Parameter Range Regularization (λ/λmax) [10 3, 1] Stopping criterion (εgap) {10 7, 10 5}.