reproducibilityindex.ai

Is normalization indispensable for training deep neural network?

Authors: Jie Shao, Kai Hu, Changhu Wang, Xiangyang Xue, Bhiksha Raj

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method on a wide range of tasks. On Image Net, our un-normalized Rescale Net models can achieve the same or slightly better performance than the corresponding normalized models (Res Net, Res Next) with the same training settings. Our un-normalized Rescale Net variant on Res Net50 has 0.3% lower error than its BN/GN Res Net50 counterpart. Our method can also apply to conventional non-residual networks. Our 19 layer VGG [30] model without normalization achieves a top-1 validation error rate of 25.0%, which is 2.6% lower than Py Torch s pre-trained model [26]. Our method also shows consistent improvement on Mask R-CNN for COCO object detection and segmentation [20], 3D convolutional networks for Kinetics video classiﬁcation [18], and deep transformers for WMT English-German machine translation [34]. In cases where normalization operations may cause problems, our method can be a competitive alternative. and 5 Experiments section.
Researcher Affiliation	Collaboration	1Fudan University, Shanghai, China 2Carnegie Mellon University, Pittsburgh, PA 3Byte Dance AI Lab, Shanghai, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Codes are available at https://github.com/ hukkai/rescaling.
Open Datasets	Yes	We experiment in the Image Net classiﬁcation dataset [8]. The dataset contains 128k training images and 50k validation images that are labeled with 1000 categories. and Our method also shows consistent improvement on Mask R-CNN for COCO object detection and segmentation [20], 3D convolutional networks for Kinetics video classiﬁcation [18], and deep transformers for WMT English-German machine translation [34].
Dataset Splits	Yes	The dataset contains 128k training images and 50k validation images that are labeled with 1000 categories. and trained in the COCO train2017 set and evaluated on the COCO val2017 set.
Hardware Specification	No	The paper mentions using GPUs for training (e.g., '8 GPUs, 2 images per GPU' for COCO), but it does not provide specific hardware details such as the models of GPUs or CPUs, or any detailed computer specifications used for running the experiments.
Software Dependencies	No	The paper mentions the use of 'PyTorch implementations' and the 'fairseq library', but it does not specify any version numbers for these or any other software components.
Experiment Setup	Yes	During training, we adopt random resized crop with a 224 224 crop size, and random horizontal ﬂip for data augmentation. We use SGD to train the models for 100 epochs. We use a weight decay of 0.0001 for all weight layers, and no weight decay for the bias and multipliers. We report the top-1 classiﬁcation error on the 224 224 center-crop in the validation set. All results are averaged over 5 runs. The default setting is to train the model with a batch size of 256 and an initial learning rate of 0.1. The learning rate is decreased at 30, 60, 90 epochs.