reproducibilityindex.ai

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods

Authors: Zaiyi Chen, Yi Xu, Enhong Chen, Tianbao Yang

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on large-scale data sets demonstrate the efﬁciency of the proposed algorithms in comparison with several variants of ADAGRAD and stochastic gradient method.
Researcher Affiliation	Academia	1University of Science and Technology of China, China 2The University of Iowa, USA.
Pseudocode	Yes	Algorithm 1 ADAGRAD(w0, η, λ, ϵ, ϵ0) Algorithm 2 SADAGRAD(w0, θ, λ, ϵ, ϵ0) Algorithm 3 ADAGRAD-PROX(w0, η, λ, ϵ, ϵ0) Algorithm 4 SADAGRAD-PROX(w0, θ, λ, ϵ, ϵ0) Algorithm 5 r SADAGRAD(w0, θ, λ1, ϵ, ϵ0, τ)
Open Source Code	No	The paper does not provide any concrete access information (e.g., a link or explicit statement of code release) for the source code.
Open Datasets	Yes	The experiments are performed on four data sets from libsvm (Chang & Lin, 2011) website with different scale of instances and features, namely covtype, epsilon, rcv1,and news20. The statistics of these data sets are shown in Table 1.
Dataset Splits	No	The paper refers to 'training data examples' but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper mentions several algorithms and tools (e.g., ADAGRAD, ADAM, RASSG, libsvm) but does not provide specific version numbers for the software dependencies used in their implementation.
Experiment Setup	Yes	The step size of ADAM is tuned in 10[ 2:2], and other parameters are chosen as recommended in the paper. For SC-ADAGRAD, the parameters α and ξ1 in their papers are tuned in 10[ 4:2] and [0.1, 1] respectively. Based on the analysis in the previous sections, the step size parameter θ would inﬂuence the convergence speed of both ADAGRAD and SADAGRAD. So we tuned this parameter for both ADAGRAD and SADAGRAD on each data set. We run ADAGRAD a number of iterations (i.e., 5,000) on each dataset and set θ = r 2(γ+maxi gk 1:5000,i 2) Pd i=1 gk 1:5000,i 2 . Besides, we set λ1 = 100λ for solving (9) and λ1 = 100ζ for solving (8) and τ = 1 for r SADAGRAD and r SADAGRAD-PROX.