Mode Normalization

Authors: Lucas Deecke, Iain Murray, Hakan Bilen

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including single and multi-task datasets.
Researcher Affiliation Academia Lucas Deecke, Iain Murray & Hakan Bilen University of Edinburgh {l.deecke,i.murray,h.bilen}@ed.ac.uk
Pseudocode Yes Algorithm 1 Mode normalization, training phase. Algorithm 2 Mode normalization, test phase. Algorithm 3 Mode group normalization.
Open Source Code Yes Accompanying code is available under github.com/ldeecke/mn-torch.
Open Datasets Yes i) MNIST (Le Cun, 1998) [...] ii) CIFAR-10 (Krizhevsky, 2009) [...] iii) SVHN (Netzer et al., 2011) [...] iv) Fashion MNIST (Xiao et al., 2017) [...] ILSVRC12 (Deng et al., 2009).
Dataset Splits Yes The dataset has a total of 60 000 training samples, as well as 10 000 samples set aside for validation. [...] CIFAR-10 [...] It contains 50 000 training and 10 000 test images.
Hardware Specification Yes We gratefully acknowledge the support of Prof. Vittorio Ferrari and Timothy Hospedales for providing computational resources, and the NVIDIA Corporation for the donation of a Titan Xp GPU used in this research.
Software Dependencies No All experiments use standard routines within Py Torch (Paszke et al., 2017). While PyTorch is mentioned, a specific version number for the software is not provided.
Experiment Setup Yes We trained for 3.5 million data touches (15 epochs), with learning rate reductions by 1/10 after 2.5 and 3 million data touches. [...] The batch size was N =128, and running estimates were kept with λ=0.1. We varied the number of modes in MN over K ={2, 4, 6}. [...] Initial learning rates were set to γ = 10 1, which we reduced by 1/10 at epochs 65 and 80 for all methods. [...] Dropout (Srivastava et al., 2014) is known to occasionally cause issues in combination with BN (Li et al., 2018), and reducing it to 0.25 (as opposed to 0.5 in the original publication) improved performance.