Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods

Authors: Junhong Lin, Volkan Cevher

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In order to see the empirical performance of the studied algorithm, we carried out some numerical simulations on a non-parametric regression problem with simulated datasets. We constructed a training data {(xi, yi)}N i=1 R R with N = 212 from the regression model y = fρ(x) + ξ... The mean and the standard deviation of these computed generalization errors over 50 trials with respect to the number of passes are depicted in the above figures.
Researcher Affiliation Academia 1 Laboratory for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne, Lausanne, Switzerland. Correspondence to: Junhong Lin <junhong.lin@epfl.ch>, Volkan Cevher <volkan.cevher@epfl.ch>.
Pseudocode Yes Algorithm 1. Let b [n]. The b-minibatch stochastic gradient methods over the sample zs is defined by fs,1 = 0 and for all t [T], fs,t+1 = fs,t ηt 1 b i=b(t 1)+1 (fs,t(xs,js,i) ys,js,i)Kxs,js,i , (4) where {ηt > 0} is a step-size sequence.
Open Source Code No The paper does not provide any statements or links indicating that open-source code for the methodology described is available.
Open Datasets No The paper mentions 'simulated datasets' for its numerical experiments, but it does not provide concrete access information (link, DOI, repository, or formal citation) for these datasets.
Dataset Splits No The paper states: 'We constructed a training data {(xi, yi)}N i=1 R R with N = 212' and 'an approximated generalization error is computed over an empirical measure with 1000 points'. However, it does not provide specific details on train/validation/test splits, percentages, or the methodology for partitioning the data.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions using a 'Gaussian kernel' for its simulations, but it does not specify any software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers.
Experiment Setup Yes For each number of partitions m {2, 8, 32, 64}, we set the step-size as ηt = 1 8n as that suggested by Part 1) of Corollary 2 in the coming subsection3, and executed simulation 50 times. In each trial, an approximated generalization error is computed over an empirical measure with 1000 points. The mean and the standard deviation of these computed generalization errors over 50 trials with respect to the number of passes are depicted in the above figures. In all the simulations, the RKHS is associated with a Gaussian kernel K(x, x ) = exp( |x x |2 /2σ2 ) where σ = 0.2, and the mini-batch size b = 1.