Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions

Authors: Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our methods perform well for training Re LU neural networks. We implement our stochastic algorithm and show that it matches the performance of empirically used SGD with momentum method for training Res Nets on the Cifar10 dataset. In this section, we evaluate the performance of our proposed algorithm Stochastic INGD on image classification tasks.
Researcher Affiliation Academia Jingzhao Zhang 1 Hongzhou Lin 1 Stefanie Jegelka 1 Suvrit Sra 1 Ali Jadbabaie 1 1Massachusetts Institute of Technology.
Pseudocode Yes Algorithm 1 Interpolated Normalized Gradient Descent; Algorithm 2 Stochastic INGD (x1, p, q, β, T, K)
Open Source Code No The paper does not explicitly state that the code for the methodology described (e.g., Stochastic INGD) is open-source or provide a direct link to it. The footnote '1https://github.com/kuangliu/pytorch-cifar' refers to a repository for standard hyperparameters for training ResNets, not the authors' own implementation.
Open Datasets Yes We train the Res Net20 (He et al., 2016) model on the CIFAR10 (Krizhevsky & Hinton, 2009) classification dataset. The dataset contains 50k training images and 10k test images in 10 classes.
Dataset Splits No The paper mentions "50k training images and 10k test images" for CIFAR10 but does not specify a validation dataset split or provide any details on how data was partitioned for validation purposes.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No We implement Stochastic INGD in Py Torch with the inbuilt auto differentiation algorithm (Paszke et al., 2017). The paper mentions PyTorch but does not specify its version number or any other software dependencies with their versions, which are necessary for reproducibility.
Experiment Setup Yes We train the model for 100 epochs with the standard hyper-parameters from the Github repository1: For SGD with momentum, we initialize the learning rate as 0.1, momentum as 0.9 and reduce the learning rate by 10 at epoch 50 and 75. The weight decay parameter is set to 5 10 4. For ADAM, we use constant the learning rate 10 3, betas in (0.9, 0.999), and weight decay parameter 10 6 and ϵ = 10 3 for the best performance. For Stochastic-INGD, we use β = 0.9, p = 1, q = 10, and weight decay parameter 5 10 4.