SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Authors: Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. and our experiments establish strong empirical evidences that SGD (when training the loss to zero value) follows a star-convex path.
Researcher Affiliation Collaboration Yi Zhou , Junjie Yang , Huishuai Zhang , Yingbin Liang , Vahid Tarokh Duke University, University of Science and Technology of China Microsoft Research, Asia, The Ohio State University
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper. The SGD update rule is described in text.
Open Source Code No No statement or link providing concrete access to source code for the methodology described in this paper was found.
Open Datasets Yes we train a standard multi-layer perceptron (MLP) network Krizhevsky (2009), a variant of Alexnet and a variant of Inception network Zhang et al. (2017a) on the CIFAR10 dataset Krizhevsky (2009) using SGD under crossentropy loss. and We train the aforementioned three types of neural networks, i.e., MLP, Alexnet and Inception, on CIFAR10 Krizhevsky (2009) and MNIST Lecun et al. (1998) dataset using SGD.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only implies the use of computing resources for training neural networks.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., libraries, frameworks, or solver versions) needed to replicate the experiment.
Experiment Setup Yes In all experiments, we adopt a constant learning rate (0.01 for MLP and Alexnet, 0.1 for Inception) and a constant mini-batch size 128.