Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Authors: Xin-Chun Li, Jin-Lin Tang, Bo Zhang, Lan Li, De-Chuan Zhan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the degree of sign consistency between the noise and the convergence point is a critical indicator of valley symmetry.
Researcher Affiliation Academia Xin-Chun Li1,2, Jin-Lin Tang1,2, Bo Zhang1,2, Lan Li1,2, De-Chuan Zhan1,2 1 School of Artificial Intelligence, Nanjing University, Nanjing, China 2 National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Pseudocode Yes Algorithm 1 Fed Sign Server Procedure
Open Source Code Yes We provide core codes and a demo code to reproduce the observed phenomena in our paper. We do not provide codes with external links. The demo code is in Code 1 and Code 2.
Open Datasets Yes The utilized datasets include sklearn.digits 3, SVHN [56], CIFAR10/100 [39], CINIC10 [8], Flowers [59], Food101 [5], and Image Net [10].
Dataset Splits Yes CIFAR10 and CIFAR100 [39] are subsets of the Tiny Images dataset and respectively have 10/100 classes to classify. They consist of 50,000 training images and 10,000 test images. The image size is 32 32. (...) CINIC10 [8]... It contains 90,000 samples for training, validation, and testing, respectively. We do not use the validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running its experiments. In the checklist, it mentions 'The experimental studies in our paper do not need too much computation budget, which could be reproduced on mainstream devices.' but no specifics.
Software Dependencies No The paper mentions software like 'Pytorch' and 'sklearn' (in Code 1 and 2), but it does not specify any version numbers for these or other software components necessary for reproducibility. For example, 'we use the pre-trained models (e.g., Res Ne Xt101 [69]) downloaded from torchvision 1'.
Experiment Setup Yes We use the SGD optimizer with a momentum value of 0.9. The default learning rate (LR) is 0.03, batch size (BS) is 256, and weight decay (WD) is 0.0005. We use a cosine annealing way to decay the learning rate across 200 training epochs.