Continual Normalization: Rethinking Batch Normalization for Online Continual Learning
Authors: Quang Pham, Chenghao Liu, Steven HOI
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on different continual learning algorithms and online scenarios show that CN is a direct replacement for BN and can provide substantial performance improvements. |
| Researcher Affiliation | Collaboration | 1 Singapore Management University hqpham.2017@smu.edu.sg 2 Salesforce Research Asia {chenghao.liu, shoi}@salesforce.com |
| Pseudocode | Yes | In the following, we provide the CN s implementation based on Pytorch (Paszke et al., 2017). class CN(_Batch Norm): def __init__(self, num_features, eps = 1e-5, G = 32, momentum): super(_CN, self).__init__(num_features, eps, momentum) self.G = G def forward(self, input): out_gn = F.group_norm(input, self.G, None, None, self.eps) out = F.batch_norm(out_gn, self.running_mean, self.running_var, self.weight, self.bias, self.training, self.momentum, self.eps) return out |
| Open Source Code | Yes | Our implementation is available at https://github.com/phquang/Continual-Normalization. |
| Open Datasets | Yes | We consider a toy experiment on the permuted MNIST (p MNIST) benchmark (Lopez-Paz & Ranzato, 2017)... We follow the standard setting in Chaudhry et al. (2019a) to split the original CIFAR100 (Krizhevsky & Hinton, 2009) or Mini IMN (Vinyals et al., 2016) datasets... |
| Dataset Splits | Yes | We follow the standard setting in Chaudhry et al. (2019a) to split the original CIFAR100 (Krizhevsky & Hinton, 2009) or Mini IMN (Vinyals et al., 2016) datasets into a sequence of 20 tasks, three of which are used for hyper-parameter cross-validation, and the remaining 17 tasks are used for continual learning. |
| Hardware Specification | No | The paper mentions 'our GPU' in Appendix D.5 but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch (Paszke et al., 2017)' and specific optimizers (SGD, Adam Kingma & Ba (2014)) but does not provide version numbers for these software dependencies or other libraries. |
| Experiment Setup | Yes | All methods use a standard Res Net 18 backbone (He et al., 2016) (not pre-trained) and are optimized over one epoch with batch size 10 using the SGD optimizer. For each continual learning strategies, we compare our proposed CN with five competing normalization layers... We cross-validate and set the number of groups to be G = 32 for our CN and GN in this experiment. |