Efficient Stochastic Gradient Hard Thresholding

Authors: Pan Zhou, Xiaotong Yuan, Jiashi Feng

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results confirm our theoretical affirmation and demonstrate the computational efficiency of the proposed methods." and "We now compare the numerical performance of HSG-HT and AHSG-HT to several state-of-the-art algorithms, including FG-HT [13], SG-HT [20] and SVRG-HT [21, 22]. We evaluate all the considered algorithms on two sets of learning tasks. The first set contains two sparsity-constrained problems: logistic regression with fi(x)=log(1+exp( bia i x))+ λ 2 x 2 2 and multi-class softmax regression with fi(x)=Pc j=1 λ 2c xj 2 2 1{bi =j} log exp(a i xj) Pc l=1exp(a i xl) , where bi is the target output of ai and c is the class number. The second one is a rank-constrained linear regression problem: bi x, ai 2 2 + λ 2 x 2 F , s.t. rank (x) k, which has several important applications including multi-class classification and multi-task regression for simultaneously learning shared characteristics of all classes/tasks [37], as well as high dimensional image and financial data modeling [5, 8]. We run simulations on six datasets, including rcv1, real-sim, mnist, news20, coil100 and caltech256. The details of these data sets are described in Appendix E. For HSG-HT and AHSG-HT, we follow our theory to exponentially expand the mini-batch size st but with small exponential rate, with τ = 1. Since there is no ground truth on real data, we run FG-HT sufficiently long until xt xt+1 / xt 10 6, and then use the output f(xt) as the approximate optimal value f for sub-optimality estimation in Figure 1 and Figure 2.
Researcher Affiliation Academia Pan Zhou Xiao-Tong Yuan Jiashi Feng Learning & Vision Lab, National University of Singapore, Singapore B-DAT Lab, Nanjing University of Information Science & Technology, Nanjing, China pzhou@u.nus.edu xtyuan@nuist.edu.cn elefjia@nus.edu.sg
Pseudocode Yes Algorithm 1: (Accelerated) Hybrid Stochastic Gradient Hard Thresholding
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We run simulations on six datasets, including rcv1, real-sim, mnist, news20, coil100 and caltech256. The details of these data sets are described in Appendix E.
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits, specific percentages, or a cross-validation setup.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library names, framework versions) used to replicate the experiment.
Experiment Setup Yes For HSG-HT and AHSG-HT, we follow our theory to exponentially expand the mini-batch size st but with small exponential rate, with τ = 1." and "We run simulations on six datasets... with regularization parameter λ = 10 5." and "Then the output x T of HSG-HT satisfies E x T x 1 1 480κs T/2 x0 x + α 12(1 β) e If(x ) , where β = α 1 1 12κs , e I = supp(Φ2k( f(x ))) supp(x ), and T is the number of iterations." (This includes learning rate and mini-batch size from theory) and "Set the learning rate η = 1 6ℓs and the mini-batch size st = τ ωt with ω = 1 1 480κs and τ 40αB 3ρsℓs x0 x 2 .