Double Forward Propagation for Memorized Batch Normalization
Authors: Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the proposed MBN method in image classification tasks. We apply MBN on several well-known models, including VGG (Krizhevsky, Sutskever, and Hinton 2012) and Res Net (He et al. 2016). |
| Researcher Affiliation | Academia | Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan School of Software Engineering, South China University of Technology, China {guo.yong, secrdyz}@mail.scut.edu.cn, {qyw, ellachen, mingkuitan}@scut.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training MBN in Single Iteration. Require: Recorded statistics in memory: {μi}k i=1, {σi}k i=1; Mean and variance of the current batch: μB, σB; Weights for batches in memory: {αi}k i=1; Learnable parameters: γ, β. We define μk+1 = μB, σk+1 = σB for convenience. |
| Open Source Code | No | The paper does not provide any statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | In the experiments, three benchmark datasets are used: CIFAR-10, CIFAR-100 (Krizhevsky and Hinton 2009) and Image Net (Russakovsky et al. 2015). |
| Dataset Splits | No | The paper specifies training and testing sample counts for datasets like CIFAR-10 ("5,000 training samples and 1,000 testing samples") but does not explicitly provide details on a separate validation dataset split. |
| Hardware Specification | Yes | All the experiments are conducted on a GPU Server with one Titan X GPU. |
| Software Dependencies | No | The paper states "All compared models are implemented based on Py Torch" but does not specify a version number for Py Torch or other software dependencies. |
| Experiment Setup | Yes | Without special specification, we train the models through SGD with a minibatch size of 128. The momentum for SGD is 0.9 and the weight decay is set to 10 4. The learning rate is initialized as 0.1 and then is divided by 10 at 40% and 60% of the training procedure, respectively. For MBN methods, we first set the parameter λ = 0.1 and then increase it to 0.5 and 0.9 at 40% and 60% of the training procedure, which is referred to as λ = {0.1, 0.5, 0.9}. And the decaying parameter η in Eqn.(8) is set to 0.9. All the experiments are performed with 200 training epochs. |