Continual Normalization: Rethinking Batch Normalization for Online Continual Learning

Authors: Quang Pham, Chenghao Liu, Steven HOI

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on different continual learning algorithms and online scenarios show that CN is a direct replacement for BN and can provide substantial performance improvements.
Researcher Affiliation Collaboration 1 Singapore Management University hqpham.2017@smu.edu.sg 2 Salesforce Research Asia {chenghao.liu, shoi}@salesforce.com
Pseudocode Yes In the following, we provide the CN s implementation based on Pytorch (Paszke et al., 2017). class CN(_Batch Norm): def __init__(self, num_features, eps = 1e-5, G = 32, momentum): super(_CN, self).__init__(num_features, eps, momentum) self.G = G def forward(self, input): out_gn = F.group_norm(input, self.G, None, None, self.eps) out = F.batch_norm(out_gn, self.running_mean, self.running_var, self.weight, self.bias, self.training, self.momentum, self.eps) return out
Open Source Code Yes Our implementation is available at https://github.com/phquang/Continual-Normalization.
Open Datasets Yes We consider a toy experiment on the permuted MNIST (p MNIST) benchmark (Lopez-Paz & Ranzato, 2017)... We follow the standard setting in Chaudhry et al. (2019a) to split the original CIFAR100 (Krizhevsky & Hinton, 2009) or Mini IMN (Vinyals et al., 2016) datasets...
Dataset Splits Yes We follow the standard setting in Chaudhry et al. (2019a) to split the original CIFAR100 (Krizhevsky & Hinton, 2009) or Mini IMN (Vinyals et al., 2016) datasets into a sequence of 20 tasks, three of which are used for hyper-parameter cross-validation, and the remaining 17 tasks are used for continual learning.
Hardware Specification No The paper mentions 'our GPU' in Appendix D.5 but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments.
Software Dependencies No The paper mentions 'Pytorch (Paszke et al., 2017)' and specific optimizers (SGD, Adam Kingma & Ba (2014)) but does not provide version numbers for these software dependencies or other libraries.
Experiment Setup Yes All methods use a standard Res Net 18 backbone (He et al., 2016) (not pre-trained) and are optimized over one epoch with batch size 10 using the SGD optimizer. For each continual learning strategies, we compare our proposed CN with five competing normalization layers... We cross-validate and set the number of groups to be G = 32 for our CN and GN in this experiment.