Efficient Stochastic Gradient Hard Thresholding
Authors: Pan Zhou, Xiaotong Yuan, Jiashi Feng
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results confirm our theoretical affirmation and demonstrate the computational efficiency of the proposed methods." and "We now compare the numerical performance of HSG-HT and AHSG-HT to several state-of-the-art algorithms, including FG-HT [13], SG-HT [20] and SVRG-HT [21, 22]. We evaluate all the considered algorithms on two sets of learning tasks. The first set contains two sparsity-constrained problems: logistic regression with fi(x)=log(1+exp( bia i x))+ λ 2 x 2 2 and multi-class softmax regression with fi(x)=Pc j=1 λ 2c xj 2 2 1{bi =j} log exp(a i xj) Pc l=1exp(a i xl) , where bi is the target output of ai and c is the class number. The second one is a rank-constrained linear regression problem: bi x, ai 2 2 + λ 2 x 2 F , s.t. rank (x) k, which has several important applications including multi-class classification and multi-task regression for simultaneously learning shared characteristics of all classes/tasks [37], as well as high dimensional image and financial data modeling [5, 8]. We run simulations on six datasets, including rcv1, real-sim, mnist, news20, coil100 and caltech256. The details of these data sets are described in Appendix E. For HSG-HT and AHSG-HT, we follow our theory to exponentially expand the mini-batch size st but with small exponential rate, with τ = 1. Since there is no ground truth on real data, we run FG-HT sufficiently long until xt xt+1 / xt 10 6, and then use the output f(xt) as the approximate optimal value f for sub-optimality estimation in Figure 1 and Figure 2. |
| Researcher Affiliation | Academia | Pan Zhou Xiao-Tong Yuan Jiashi Feng Learning & Vision Lab, National University of Singapore, Singapore B-DAT Lab, Nanjing University of Information Science & Technology, Nanjing, China pzhou@u.nus.edu xtyuan@nuist.edu.cn elefjia@nus.edu.sg |
| Pseudocode | Yes | Algorithm 1: (Accelerated) Hybrid Stochastic Gradient Hard Thresholding |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We run simulations on six datasets, including rcv1, real-sim, mnist, news20, coil100 and caltech256. The details of these data sets are described in Appendix E. |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits, specific percentages, or a cross-validation setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library names, framework versions) used to replicate the experiment. |
| Experiment Setup | Yes | For HSG-HT and AHSG-HT, we follow our theory to exponentially expand the mini-batch size st but with small exponential rate, with τ = 1." and "We run simulations on six datasets... with regularization parameter λ = 10 5." and "Then the output x T of HSG-HT satisfies E x T x 1 1 480κs T/2 x0 x + α 12(1 β) e If(x ) , where β = α 1 1 12κs , e I = supp(Φ2k( f(x ))) supp(x ), and T is the number of iterations." (This includes learning rate and mini-batch size from theory) and "Set the learning rate η = 1 6ℓs and the mini-batch size st = τ ωt with ω = 1 1 480κs and τ 40αB 3ρsℓs x0 x 2 . |