reproducibilityindex.ai

Double Forward Propagation for Memorized Batch Normalization

Authors: Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the proposed MBN method in image classiﬁcation tasks. We apply MBN on several well-known models, including VGG (Krizhevsky, Sutskever, and Hinton 2012) and Res Net (He et al. 2016).
Researcher Affiliation	Academia	Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan School of Software Engineering, South China University of Technology, China {guo.yong, secrdyz}@mail.scut.edu.cn, {qyw, ellachen, mingkuitan}@scut.edu.cn
Pseudocode	Yes	Algorithm 1 Training MBN in Single Iteration. Require: Recorded statistics in memory: {μi}k i=1, {σi}k i=1; Mean and variance of the current batch: μB, σB; Weights for batches in memory: {αi}k i=1; Learnable parameters: γ, β. We deﬁne μk+1 = μB, σk+1 = σB for convenience.
Open Source Code	No	The paper does not provide any statement or link for open-source code for the described methodology.
Open Datasets	Yes	In the experiments, three benchmark datasets are used: CIFAR-10, CIFAR-100 (Krizhevsky and Hinton 2009) and Image Net (Russakovsky et al. 2015).
Dataset Splits	No	The paper specifies training and testing sample counts for datasets like CIFAR-10 ("5,000 training samples and 1,000 testing samples") but does not explicitly provide details on a separate validation dataset split.
Hardware Specification	Yes	All the experiments are conducted on a GPU Server with one Titan X GPU.
Software Dependencies	No	The paper states "All compared models are implemented based on Py Torch" but does not specify a version number for Py Torch or other software dependencies.
Experiment Setup	Yes	Without special speciﬁcation, we train the models through SGD with a minibatch size of 128. The momentum for SGD is 0.9 and the weight decay is set to 10 4. The learning rate is initialized as 0.1 and then is divided by 10 at 40% and 60% of the training procedure, respectively. For MBN methods, we ﬁrst set the parameter λ = 0.1 and then increase it to 0.5 and 0.9 at 40% and 60% of the training procedure, which is referred to as λ = {0.1, 0.5, 0.9}. And the decaying parameter η in Eqn.(8) is set to 0.9. All the experiments are performed with 200 training epochs.