SADAGRAD: Strongly Adaptive Stochastic Gradient Methods
Authors: Zaiyi Chen, Yi Xu, Enhong Chen, Tianbao Yang
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on large-scale data sets demonstrate the efficiency of the proposed algorithms in comparison with several variants of ADAGRAD and stochastic gradient method. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China, China 2The University of Iowa, USA. |
| Pseudocode | Yes | Algorithm 1 ADAGRAD(w0, η, λ, ϵ, ϵ0) Algorithm 2 SADAGRAD(w0, θ, λ, ϵ, ϵ0) Algorithm 3 ADAGRAD-PROX(w0, η, λ, ϵ, ϵ0) Algorithm 4 SADAGRAD-PROX(w0, θ, λ, ϵ, ϵ0) Algorithm 5 r SADAGRAD(w0, θ, λ1, ϵ, ϵ0, τ) |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a link or explicit statement of code release) for the source code. |
| Open Datasets | Yes | The experiments are performed on four data sets from libsvm (Chang & Lin, 2011) website with different scale of instances and features, namely covtype, epsilon, rcv1,and news20. The statistics of these data sets are shown in Table 1. |
| Dataset Splits | No | The paper refers to 'training data examples' but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions several algorithms and tools (e.g., ADAGRAD, ADAM, RASSG, libsvm) but does not provide specific version numbers for the software dependencies used in their implementation. |
| Experiment Setup | Yes | The step size of ADAM is tuned in 10[ 2:2], and other parameters are chosen as recommended in the paper. For SC-ADAGRAD, the parameters α and ξ1 in their papers are tuned in 10[ 4:2] and [0.1, 1] respectively. Based on the analysis in the previous sections, the step size parameter θ would influence the convergence speed of both ADAGRAD and SADAGRAD. So we tuned this parameter for both ADAGRAD and SADAGRAD on each data set. We run ADAGRAD a number of iterations (i.e., 5,000) on each dataset and set θ = r 2(γ+maxi gk 1:5000,i 2) Pd i=1 gk 1:5000,i 2 . Besides, we set λ1 = 100λ for solving (9) and λ1 = 100ζ for solving (8) and τ = 1 for r SADAGRAD and r SADAGRAD-PROX. |