Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning

Authors: Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Jarvis Haupt

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments demonstrate the efficiency of our method in terms of both parameter estimation and computational performance. We compare the empirical performance of the SVR-GHT algorithm with two other candidate algorithms: GHT proposed in (Jain et al., 2014) and SGHT proposed in (Nguyen et al., 2014) on both synthetic data and real data.
Researcher Affiliation Academia Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 Department of Operations Research and Financial Engineering, Princeton University, NJ 08544
Pseudocode Yes We summarize the proposed stochastic variance reduced gradient hard thresholding (SVR-GHT) algorithm in Algorithm 1. Algorithm 1 Stochastic Variance Reduced Gradient Hard Thresholding Algorithm.
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for the source code of the methodology described.
Open Datasets Yes We consider a sparse linear regression problem. We generate each row of the design matrix Ai Rd independently from N(0, Σ). The response vector is generated from the linear model y = Aw + ϵ, where w Rd is the regression coefficient vector, and ϵ Rn is generated from N(0, σ2I) with σ = 1. We adopt a subset of RCV1 dataset with 9625 documents and 29992 distinct words, including the classes of C15 , ECAT , GCAT , and MCAT (Cai & He, 2012).
Dataset Splits No The paper does not explicitly provide information about a validation dataset split (e.g., percentages, sample counts, or methodology for a distinct validation set).
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory amounts, or computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We set nb = 10000, d = 25000, k = 200 and k = 500. For simplicity, we choose m = n throughout our experiments. We illustrate step sizes η = 1/256, 1/512 and 1/1024. We choose k = 200 and m = n for both settings of all classes. the optimal step size η for each algorithm is chosen from a sequence of values {1/25, 1/26, . . . , 1/214}.