Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets

Authors: Jie Wang, Zhanqiu Zhang, Jieping Ye

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in Section 6 on both synthetic and real data demonstrate that the speedup gained by the proposed screening rules in solving SGL and nonnegative Lasso can be orders of magnitude.
Researcher Affiliation	Academia	Jie Wang EMAIL Department of Electronic Engineering and Information Science University of Science and Technology of China Hefei, Anhui, China; Zhanqiu Zhang EMAIL Department of Electronic Engineering and Information Science University of Science and Technology of China Hefei, Anhui, China; Jieping Ye EMAIL Department of Computational Medicine and Bioinformatics Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2218, USA
Pseudocode	Yes	Algorithm 1 Guidelines for developing TLFre. 1: Given a pair of parameter values (λ, α), we estimate a region Θ that contains the dual optimum θ (λ, α) of (4). 2: We solve the following two optimization problems:
Open Source Code	Yes	The code is available at http://dpc-screening.github.io/.
Open Datasets	Yes	We perform experiments on two commonly used real data sets the Alzheimer s Disease Neuroimaging Initiative (ADNI) data set (http://adni.loni.usc.edu/) and the news20.binary (Chang and Lin, 2011) data set.
Dataset Splits	Yes	We generate two data sets with 1000 160000 entries: Synthetic 1 and Synthetic 2. We randomly divide the 160000 features into 16000 groups. [...] The training and test sets contain 60, 000 and 10, 000 images, respectively. We ﬁrst randomly select 5000 images for each digit from the training set and get a data matrix X R784 50000. Then, in each trial, we randomly select an image from the testing set as the response y R784.
Hardware Specification	No	No specific hardware details (like exact GPU/CPU models, processor types, or memory amounts) used for running the experiments are mentioned in the paper.
Software Dependencies	No	We use sg Least R from the SLEP package (Liu et al., 2009) as the solver for SGL, which is one of the state-of-the-arts (Zhang et al., 2018b) [see Section G for a comparison between sg Least R and another popular solver (Lin et al., 2014)].
Experiment Setup	Yes	Given a data set, for illustrative purposes only, we select seven values of α from {tan(ψ) : ψ = 5 , 15 , 30 , 45 , 60 , 75 , 85 }. Then, for each value of α, we run TLFre along a sequence of 100 values of λ equally spaced on the logarithmic scale of λ/λα max from 1 to 0.01. [...] We use zero as the initial point.