Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks
Authors: Xin-Chun Li, Jin-Lin Tang, Bo Zhang, Lan Li, De-Chuan Zhan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the degree of sign consistency between the noise and the convergence point is a critical indicator of valley symmetry. |
| Researcher Affiliation | Academia | Xin-Chun Li1,2, Jin-Lin Tang1,2, Bo Zhang1,2, Lan Li1,2, De-Chuan Zhan1,2 1 School of Artificial Intelligence, Nanjing University, Nanjing, China 2 National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China |
| Pseudocode | Yes | Algorithm 1 Fed Sign Server Procedure |
| Open Source Code | Yes | We provide core codes and a demo code to reproduce the observed phenomena in our paper. We do not provide codes with external links. The demo code is in Code 1 and Code 2. |
| Open Datasets | Yes | The utilized datasets include sklearn.digits 3, SVHN [56], CIFAR10/100 [39], CINIC10 [8], Flowers [59], Food101 [5], and Image Net [10]. |
| Dataset Splits | Yes | CIFAR10 and CIFAR100 [39] are subsets of the Tiny Images dataset and respectively have 10/100 classes to classify. They consist of 50,000 training images and 10,000 test images. The image size is 32 32. (...) CINIC10 [8]... It contains 90,000 samples for training, validation, and testing, respectively. We do not use the validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running its experiments. In the checklist, it mentions 'The experimental studies in our paper do not need too much computation budget, which could be reproduced on mainstream devices.' but no specifics. |
| Software Dependencies | No | The paper mentions software like 'Pytorch' and 'sklearn' (in Code 1 and 2), but it does not specify any version numbers for these or other software components necessary for reproducibility. For example, 'we use the pre-trained models (e.g., Res Ne Xt101 [69]) downloaded from torchvision 1'. |
| Experiment Setup | Yes | We use the SGD optimizer with a momentum value of 0.9. The default learning rate (LR) is 0.03, batch size (BS) is 256, and weight decay (WD) is 0.0005. We use a cosine annealing way to decay the learning rate across 200 training epochs. |