Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets
Authors: Jie Wang, Zhanqiu Zhang, Jieping Ye
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in Section 6 on both synthetic and real data demonstrate that the speedup gained by the proposed screening rules in solving SGL and nonnegative Lasso can be orders of magnitude. |
| Researcher Affiliation | Academia | Jie Wang EMAIL Department of Electronic Engineering and Information Science University of Science and Technology of China Hefei, Anhui, China; Zhanqiu Zhang EMAIL Department of Electronic Engineering and Information Science University of Science and Technology of China Hefei, Anhui, China; Jieping Ye EMAIL Department of Computational Medicine and Bioinformatics Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2218, USA |
| Pseudocode | Yes | Algorithm 1 Guidelines for developing TLFre. 1: Given a pair of parameter values (λ, α), we estimate a region Θ that contains the dual optimum θ (λ, α) of (4). 2: We solve the following two optimization problems: |
| Open Source Code | Yes | The code is available at http://dpc-screening.github.io/. |
| Open Datasets | Yes | We perform experiments on two commonly used real data sets the Alzheimer s Disease Neuroimaging Initiative (ADNI) data set (http://adni.loni.usc.edu/) and the news20.binary (Chang and Lin, 2011) data set. |
| Dataset Splits | Yes | We generate two data sets with 1000 160000 entries: Synthetic 1 and Synthetic 2. We randomly divide the 160000 features into 16000 groups. [...] The training and test sets contain 60, 000 and 10, 000 images, respectively. We first randomly select 5000 images for each digit from the training set and get a data matrix X R784 50000. Then, in each trial, we randomly select an image from the testing set as the response y R784. |
| Hardware Specification | No | No specific hardware details (like exact GPU/CPU models, processor types, or memory amounts) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | We use sg Least R from the SLEP package (Liu et al., 2009) as the solver for SGL, which is one of the state-of-the-arts (Zhang et al., 2018b) [see Section G for a comparison between sg Least R and another popular solver (Lin et al., 2014)]. |
| Experiment Setup | Yes | Given a data set, for illustrative purposes only, we select seven values of α from {tan(ψ) : ψ = 5 , 15 , 30 , 45 , 60 , 75 , 85 }. Then, for each value of α, we run TLFre along a sequence of 100 values of λ equally spaced on the logarithmic scale of λ/λα max from 1 to 0.01. [...] We use zero as the initial point. |