BN-invariant Sharpness Regularizes the Training Model to Better Generalization
Authors: Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our algorithm achieves considerably better performance than vanilla SGD over various experiment settings. [...] We test our algorithm on CIFAR dataset [Krizhevsky et al., 2012], it has preferable results under large batch size compared with baselines (SGD, Entropy SGD). |
| Researcher Affiliation | Collaboration | Mingyang Yi1,2 , Huishuai Zhang3 , Wei Chen3 , Zhi-Ming Ma1,2 and Tie-Yan Liu3 1University of Chinese Academy of Sciences 2Academy of Mathematics and Systems Science 3 Microsoft Research yimingyang17@mails.ucas.edu.cn, mazm@amt.ac.cn, {huzhang, wche, tie-yan.liu}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 SGD with BN-Sharpness regularization |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code availability. |
| Open Datasets | Yes | First we test the algorithm with fully batch normalized Le Net [Le Cun et al., 1998] to test the performance for CIFAR10 [Krizhevsky et al., 2012]. |
| Dataset Splits | Yes | First we test the algorithm with fully batch normalized Le Net [Le Cun et al., 1998] to test the performance for CIFAR10 [Krizhevsky et al., 2012]. [...] For SGDS, the δ in CIFAR10 is 5e-4 and in CIFAR100 is 1e-3, learning rate is 0.2 and decay by a factor 0.1 respectively in epoch 60, 120, 160. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | Update rule is SGD with momentum by setting learning rate as 0.2 and decay it by a factor 0.1 respectively in epoch 60, 120, 160 and momentum parameter as 0.9. We use 10000 batch size, and 5e-4 weight decay ratio for all the three experiments. [...] For the experiments with regularized BN-Sharpness, we choose λ as 1e-4 which increase by a factor of 1.02 for each epoch. We set δ = 0.001, and the p is chosen as 2. |