SGD Converges to Global Minimum in Deep Learning via Star-convex Path
Authors: Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is veriļ¬ed by various experiments in this paper. and our experiments establish strong empirical evidences that SGD (when training the loss to zero value) follows a star-convex path. |
| Researcher Affiliation | Collaboration | Yi Zhou , Junjie Yang , Huishuai Zhang , Yingbin Liang , Vahid Tarokh Duke University, University of Science and Technology of China Microsoft Research, Asia, The Ohio State University |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. The SGD update rule is described in text. |
| Open Source Code | No | No statement or link providing concrete access to source code for the methodology described in this paper was found. |
| Open Datasets | Yes | we train a standard multi-layer perceptron (MLP) network Krizhevsky (2009), a variant of Alexnet and a variant of Inception network Zhang et al. (2017a) on the CIFAR10 dataset Krizhevsky (2009) using SGD under crossentropy loss. and We train the aforementioned three types of neural networks, i.e., MLP, Alexnet and Inception, on CIFAR10 Krizhevsky (2009) and MNIST Lecun et al. (1998) dataset using SGD. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only implies the use of computing resources for training neural networks. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., libraries, frameworks, or solver versions) needed to replicate the experiment. |
| Experiment Setup | Yes | In all experiments, we adopt a constant learning rate (0.01 for MLP and Alexnet, 0.1 for Inception) and a constant mini-batch size 128. |