Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Authors: Yi Xu, Qihang Lin, Tianbao Yang

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we perform some experiments to demonstrate effectiveness of proposed algorithms. We use very large-scale datasets from libsvm website in experiments, including covtype.binary (n = 581012), real-sim (n = 72309), url (n = 2396130) for classification, million songs (n = 463715), E2006-tfidf (n = 16087), E2006-log1p (n = 16087) for regression.
Researcher Affiliation Academia 1Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA 2Department of Management Sciences, The University of Iowa, Iowa City, IA 52242, USA.
Pseudocode Yes Algorithm 1 ASSG-c(w0, K, t, D1, ϵ0) and Algorithm 2 the ASSG-r algorithm for solving (1) are provided.
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We use very large-scale datasets from libsvm website in experiments, including covtype.binary (n = 581012), real-sim (n = 72309), url (n = 2396130) for classification, million songs (n = 463715), E2006-tfidf (n = 16087), E2006-log1p (n = 16087) for regression.
Dataset Splits No The paper mentions using datasets for experiments but does not provide specific training, validation, or test split percentages or sample counts, nor does it refer to standard predefined splits with citations.
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using SAGA and SVRG++ (which are algorithms), but does not provide specific version numbers for any software dependencies, libraries, or programming languages used.
Experiment Setup Yes The regularization parameter λ is set to be 10^-4 in all tasks (We also perform the experiments with λ = 10^-2 and include the results in the supplement). We set γ = 1 in Huber loss and p = 1.5 in robust regression...We use a decreasing step size proportional to 1/τ (τ is the iteration index) in SSG...The value of D1 in both ASSG and RASSG is set to 100 for all problems...In implementing the RASSG, we restart every 5 stages with t increased by a factor of 1.15, 2 and 2 respectively for hinge loss, Huber loss and robust regression. We tune the parameter ω among {0.3, 0.6, 0.9, 1}.