Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

Authors: Xiaoyong Yuan, Zheng Feng, Matthew Norton, Xiaolin Li1682-1689

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Utilizing the suggested deviation measure and statistic, we show experimentally that training is accelerated more so than with conventional BN, often with improved error rate as well.
Researcher Affiliation Academia Xiaoyong Yuan University of Florida chbrian@ufl.edu University of Florida fengzheng@ufl.edu Matthew Norton Naval Postgraduate School mnorton@nps.edu Xiaolin Li University of Florida andyli@ece.ufl.edu
Pseudocode No The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor any structured code blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We demonstrate on MNIST, CIFAR-10, CIFAR-100, and SVHN datasets that the speed of convergence of stochastic gradient descent (SGD) can be increased by simply choosing a different D and S and that, in some settings, we obtain improved predictive performance.
Dataset Splits No The paper mentions training on datasets and evaluating on a 'held out test set' but does not provide specific details on the train/validation/test split percentages or sample counts for reproduction.
Hardware Specification No The paper does not provide specific details on the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper discusses neural network architectures and optimizers but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We conduct classification on MNIST (Le Cun et al. 1998) with neural network architecture Le Net with the input size of 28x28 and two convolutional layers with kernel size 5, and number of filters 20 and 50 respectively. ... with vanilla SGD as the optimizer, with learning rate equal to .01, and batch size equal to 1000.