reproducibilityindex.ai

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Authors: Xin-Chun Li, Jin-Lin Tang, Bo Zhang, Lan Li, De-Chuan Zhan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that inﬂuence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the degree of sign consistency between the noise and the convergence point is a critical indicator of valley symmetry.
Researcher Affiliation	Academia	Xin-Chun Li1,2, Jin-Lin Tang1,2, Bo Zhang1,2, Lan Li1,2, De-Chuan Zhan1,2 1 School of Artiﬁcial Intelligence, Nanjing University, Nanjing, China 2 National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Pseudocode	Yes	Algorithm 1 Fed Sign Server Procedure
Open Source Code	Yes	We provide core codes and a demo code to reproduce the observed phenomena in our paper. We do not provide codes with external links. The demo code is in Code 1 and Code 2.
Open Datasets	Yes	The utilized datasets include sklearn.digits 3, SVHN [56], CIFAR10/100 [39], CINIC10 [8], Flowers [59], Food101 [5], and Image Net [10].
Dataset Splits	Yes	CIFAR10 and CIFAR100 [39] are subsets of the Tiny Images dataset and respectively have 10/100 classes to classify. They consist of 50,000 training images and 10,000 test images. The image size is 32 32. (...) CINIC10 [8]... It contains 90,000 samples for training, validation, and testing, respectively. We do not use the validation set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running its experiments. In the checklist, it mentions 'The experimental studies in our paper do not need too much computation budget, which could be reproduced on mainstream devices.' but no specifics.
Software Dependencies	No	The paper mentions software like 'Pytorch' and 'sklearn' (in Code 1 and 2), but it does not specify any version numbers for these or other software components necessary for reproducibility. For example, 'we use the pre-trained models (e.g., Res Ne Xt101 [69]) downloaded from torchvision 1'.
Experiment Setup	Yes	We use the SGD optimizer with a momentum value of 0.9. The default learning rate (LR) is 0.03, batch size (BS) is 256, and weight decay (WD) is 0.0005. We use a cosine annealing way to decay the learning rate across 200 training epochs.