reproducibilityindex.ai

Fast convergence of stochastic subgradient method under interpolation

Authors: Huang Fang, Zhenan Fan, Michael Friedlander

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a formal analysis showing that SSGD for nonsmooth objectives could converge as fast as smooth objectives in the interpolation setting. Our contributions include: ... Proof that the iteration bound O(1/ϵ) is optimal... We now present some numerical experiments to compare the convergence of SSGD for training Re LU neural networks with smooth and nonsmooth loss functions.
Researcher Affiliation	Academia	Huang Fang, Zhenan Fan & Michael P. Friedlander Department of Computer Science University of British Columbia Vancouver, BC, Canada
Pseudocode	Yes	Algorithm 1 Stochastic subgradient descent. The learning rate function αt : N R+ returns the learning rate at iteration t. 1: Initialize: w(1) Rd 2: for t = 1, 2, . . . do 3: select i {1, 2, . . . , n} uniformly at random 4: compute g(t) fi(w(t)) 5: w(t+1) = w(t) αtg(t)
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We train the Le Net (Lecun et al., 1998) on the MNIST dataset to classify 4 s and 9 s.
Dataset Splits	No	The paper mentions using training data and the MNIST dataset but does not specify any explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions neural network architectures and loss functions, but does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	We randomly generate a small one hidden layer neural network with 16 neurons and Re LU activation as the teacher network... We overparameterize the student neural network and set it to be a one hidden layer network with 512 neurons and Re LU activation. We train the student network with different loss functions: squared loss e.g., 1/n Pn i=1(yi ˆyi)2 and absolute loss e.g., 1/n Pn i=1 \|yi ˆyi\| with different learning rates. We train the Le Net (Lecun et al., 1998) on the MNIST dataset... Then we run SSGD to train the model with difference loss functions: logistic loss e.g., 1/n Pn i=1 log(1+exp(−yi ˆyi)) and L1-hinge loss e.g., 1/n Pn i=1 max{0, 1 − yiˆyi} and with different learning rates.