Mode Normalization
Authors: Lucas Deecke, Iain Murray, Hakan Bilen
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including single and multi-task datasets. |
| Researcher Affiliation | Academia | Lucas Deecke, Iain Murray & Hakan Bilen University of Edinburgh {l.deecke,i.murray,h.bilen}@ed.ac.uk |
| Pseudocode | Yes | Algorithm 1 Mode normalization, training phase. Algorithm 2 Mode normalization, test phase. Algorithm 3 Mode group normalization. |
| Open Source Code | Yes | Accompanying code is available under github.com/ldeecke/mn-torch. |
| Open Datasets | Yes | i) MNIST (Le Cun, 1998) [...] ii) CIFAR-10 (Krizhevsky, 2009) [...] iii) SVHN (Netzer et al., 2011) [...] iv) Fashion MNIST (Xiao et al., 2017) [...] ILSVRC12 (Deng et al., 2009). |
| Dataset Splits | Yes | The dataset has a total of 60 000 training samples, as well as 10 000 samples set aside for validation. [...] CIFAR-10 [...] It contains 50 000 training and 10 000 test images. |
| Hardware Specification | Yes | We gratefully acknowledge the support of Prof. Vittorio Ferrari and Timothy Hospedales for providing computational resources, and the NVIDIA Corporation for the donation of a Titan Xp GPU used in this research. |
| Software Dependencies | No | All experiments use standard routines within Py Torch (Paszke et al., 2017). While PyTorch is mentioned, a specific version number for the software is not provided. |
| Experiment Setup | Yes | We trained for 3.5 million data touches (15 epochs), with learning rate reductions by 1/10 after 2.5 and 3 million data touches. [...] The batch size was N =128, and running estimates were kept with λ=0.1. We varied the number of modes in MN over K ={2, 4, 6}. [...] Initial learning rates were set to γ = 10 1, which we reduced by 1/10 at epochs 65 and 80 for all methods. [...] Dropout (Srivastava et al., 2014) is known to occasionally cause issues in combination with BN (Li et al., 2018), and reducing it to 0.25 (as opposed to 0.5 in the original publication) improved performance. |