Kalman Normalization: Normalizing Internal Representations Across Network Layers

Authors: Guangrun Wang, jiefeng peng, Ping Luo, Xinjiang Wang, Liang Lin

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments. We first evaluate KN on Image Net 2012 classification dataset. Table 1 compares the top-1 validation accuracies. KN achieves 76.1% top-1 accuracy, outperforming BN and BRN by a large margin (3.4% and 3.4%).
Researcher Affiliation Collaboration Guangrun Wang Sun Yat-sen University wanggrun@mail2.sysu.edu.cn Jiefeng Peng Sun Yat-sen University jiefengpeng@gmail.com Ping Luo The Chinese University of Hong Kong pluo.lhi@gmail.com Xinjiang Wang Sense Time Group Ltd. Liang Lin Sun Yat-sen University linliang@ieee.org
Pseudocode No The paper describes the approach using mathematical equations (e.g., Eqn. 7) and descriptive text, but it does not present structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide any concrete access information for source code, such as a repository link or an explicit statement of code release.
Open Datasets Yes We first evaluate KN on Image Net 2012 classification dataset [24] which consists of 1, 000 categories. To investigate the application of micro-batch training, we use COCO 2017 detection & segmentation benchmark [6]. We conducted more studies on the CIFAR-10 and CIFAR-100 dataset [15]. We also conduct experiments on SVHN dataset [20].
Dataset Splits Yes The models are trained on the 1.28M training images and evaluated on the 50k validation images. The models are trained in the COCO train2017 set and evaluated in the COCO val2017 set. both of which consist of 50k training images and 10k testing images.
Hardware Specification Yes For a fair comparison, both methods are trained in the same computing machine with four Titan X GPUs.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Our baseline models are three representative networks, including Inceptionv2 [27], Res Net50, and Res Net101 [8]. We employ the baseline of typical batch size (i.e.32) for comparison. We use the schedule of 280k training steps. Specifically, the resolution is set as (800, 1333); and we sample 256 boxes for each image. We use batch size of only 2.