Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Authors: Sheng Liu, Xiao Li, Yuexiang Zhai, Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our findings via numerical experiments on CIFAR and Image Net.
Researcher Affiliation Collaboration Sheng Liu New York University shengliu@nyu.edu Xiao Li University of Michigan xlxiao@umich.edu Yuexiang Zhai UC Berkeley ysz@berkeley.edu Chong You Google Research cyou@google.com Zhihui Zhu University of Denver zhihui.zhu@du.edu Carlos Fernandez-Granda New York University cfgranda@cims.nyu.edu Qing Qu University of Michigan qingqu@umich.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is available online at https://github.com/shengliu66/Conv Norm.
Open Datasets Yes We verify our findings via numerical experiments on CIFAR and Image Net.
Dataset Splits Yes We use 10% of the training set for validation and treat the validation set as a held-out test set.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No We adopt Xavier uniform initialization [61] which is the default initialization in Py Torch for all networks.
Experiment Setup Yes For all experiments, if not otherwise mentioned, CIFAR10 and CIFAR-100 datasets are processed with standard augmentations, i.e., random cropping and flipping. We use 10% of the training set for validation and treat the validation set as a held-out test set. For Image Net, we perform standard random resizing and flipping. For training, we observe our Conv Norm is not sensitive to the learning rate, and thus we fix the initial learning rate to 0.1 for all experiments. For experiments on CIFAR-10, we run 120 epochs and divide the learning rate by 10 at the 40th and 80th epochs; for CIFAR-100, we run 150 epochs and divide the learning rate by 10 at the 60th and 120th epoch; for Image Net,we run 90 epochs and divide the learning rate by 10 at the 30th and 90th epochs. The optimization is done using SGD with a momentum of 0.9 and a weight decay of 0.0001 for all datasets. For networks we use two backbone networks: VGG16 [60] and Res Net18 [5]. We adopt Xavier uniform initialization [61] which is the default initialization in Py Torch for all networks.