Bad Global Minima Exist and SGD Can Reach Them

Authors: Shengchao Liu, Dimitris Papailiopoulos, Dimitris Achlioptas

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We ran experiments on the CIFAR [21] dataset (including CIFAR10 and CIFAR100), CINIC10 [22] and a resized Restricted Image Net [23]. In Section 3, we show that the phenomenon sketched above persists in state-of-the-art neural network architectures over real datasets. Specifically, we examine VGG16, Res Net18, Res Net50, and Dense Net40, trained on CIFAR, CINIC10, and a restricted version of Image Net.
Researcher Affiliation Academia Shengchao Liu Quebec Artificial Intelligence Institute (Mila) Université de Montréal liusheng@mila.quebec; Dimitris Papailiopoulos University of Wisconsin-Madison dimitris@papail.io; Dimitris Achlioptas University of Athens optas@di.uoa.gr
Pseudocode Yes Algorithm 1 Adversarial initialization
Open Source Code Yes Our figures, models, and all results can be reproduced using the code available at an anonymous Git Hub repository: https://github.com/chao1224/Bad Global Minima.
Open Datasets Yes We ran experiments on the CIFAR [21] dataset (including CIFAR10 and CIFAR100), CINIC10 [22] and a resized Restricted Image Net [23].
Dataset Splits No The paper explicitly states the size of training and test sets for each dataset (e.g., 'The CIFAR training set consists of 50k data points and the test set consists of 10k data points'). However, it does not provide specific details on how a validation set was created or its size, nor does it mention a specific split methodology for validation.
Hardware Specification No The paper does not specify the hardware used for experiments, such as exact GPU/CPU models, memory, or specific cloud computing instances with their specifications.
Software Dependencies Yes We run our experiments on Py Torch 0.3.
Experiment Setup Yes Hyperparameters We apply well-tuned hyperparameters for each model and dataset. For CIFAR, CINIC10, and Restricted Image Net, we use batch size 128, while the momentum term is set to 0.9 when it is used. When we use ℓ2 regularization, the regularization parameter is 5 × 10−4 for CIFAR and Restricted Image Net and 10−4 for CINIC10. We use the following learning rate schedule for CIFAR: 0.1 for epochs 1 to 150, 0.01 for epoch 151 to 250, and 0.001 for epochs 251 to 350. We use the following learning rate schedules for CINIC10 and Restricted Image Net: 0.1 for epochs 1 to 150, 0.01 for epoch 151 to 225, and 0.001 for epochs 226 to 300.