Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training
Authors: Sheng Liu, Xiao Li, Yuexiang Zhai, Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our findings via numerical experiments on CIFAR and Image Net. |
| Researcher Affiliation | Collaboration | Sheng Liu New York University shengliu@nyu.edu Xiao Li University of Michigan xlxiao@umich.edu Yuexiang Zhai UC Berkeley ysz@berkeley.edu Chong You Google Research cyou@google.com Zhihui Zhu University of Denver zhihui.zhu@du.edu Carlos Fernandez-Granda New York University cfgranda@cims.nyu.edu Qing Qu University of Michigan qingqu@umich.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is available online at https://github.com/shengliu66/Conv Norm. |
| Open Datasets | Yes | We verify our findings via numerical experiments on CIFAR and Image Net. |
| Dataset Splits | Yes | We use 10% of the training set for validation and treat the validation set as a held-out test set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | We adopt Xavier uniform initialization [61] which is the default initialization in Py Torch for all networks. |
| Experiment Setup | Yes | For all experiments, if not otherwise mentioned, CIFAR10 and CIFAR-100 datasets are processed with standard augmentations, i.e., random cropping and flipping. We use 10% of the training set for validation and treat the validation set as a held-out test set. For Image Net, we perform standard random resizing and flipping. For training, we observe our Conv Norm is not sensitive to the learning rate, and thus we fix the initial learning rate to 0.1 for all experiments. For experiments on CIFAR-10, we run 120 epochs and divide the learning rate by 10 at the 40th and 80th epochs; for CIFAR-100, we run 150 epochs and divide the learning rate by 10 at the 60th and 120th epoch; for Image Net,we run 90 epochs and divide the learning rate by 10 at the 30th and 90th epochs. The optimization is done using SGD with a momentum of 0.9 and a weight decay of 0.0001 for all datasets. For networks we use two backbone networks: VGG16 [60] and Res Net18 [5]. We adopt Xavier uniform initialization [61] which is the default initialization in Py Torch for all networks. |