Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions
Authors: Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our methods perform well for training Re LU neural networks. We implement our stochastic algorithm and show that it matches the performance of empirically used SGD with momentum method for training Res Nets on the Cifar10 dataset. In this section, we evaluate the performance of our proposed algorithm Stochastic INGD on image classification tasks. |
| Researcher Affiliation | Academia | Jingzhao Zhang 1 Hongzhou Lin 1 Stefanie Jegelka 1 Suvrit Sra 1 Ali Jadbabaie 1 1Massachusetts Institute of Technology. |
| Pseudocode | Yes | Algorithm 1 Interpolated Normalized Gradient Descent; Algorithm 2 Stochastic INGD (x1, p, q, β, T, K) |
| Open Source Code | No | The paper does not explicitly state that the code for the methodology described (e.g., Stochastic INGD) is open-source or provide a direct link to it. The footnote '1https://github.com/kuangliu/pytorch-cifar' refers to a repository for standard hyperparameters for training ResNets, not the authors' own implementation. |
| Open Datasets | Yes | We train the Res Net20 (He et al., 2016) model on the CIFAR10 (Krizhevsky & Hinton, 2009) classification dataset. The dataset contains 50k training images and 10k test images in 10 classes. |
| Dataset Splits | No | The paper mentions "50k training images and 10k test images" for CIFAR10 but does not specify a validation dataset split or provide any details on how data was partitioned for validation purposes. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | We implement Stochastic INGD in Py Torch with the inbuilt auto differentiation algorithm (Paszke et al., 2017). The paper mentions PyTorch but does not specify its version number or any other software dependencies with their versions, which are necessary for reproducibility. |
| Experiment Setup | Yes | We train the model for 100 epochs with the standard hyper-parameters from the Github repository1: For SGD with momentum, we initialize the learning rate as 0.1, momentum as 0.9 and reduce the learning rate by 10 at epoch 50 and 75. The weight decay parameter is set to 5 10 4. For ADAM, we use constant the learning rate 10 3, betas in (0.9, 0.999), and weight decay parameter 10 6 and ϵ = 10 3 for the best performance. For Stochastic-INGD, we use β = 0.9, p = 1, q = 10, and weight decay parameter 5 10 4. |