Differentiable Learning-to-Normalize via Switchable Normalization
Authors: Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents the main results of SN in multiple challenging problems and benchmarks, such as Image Net (Russakovsky et al., 2015), COCO (Lin et al., 2014), Cityscapes (Cordts et al., 2016), ADE20K (Zhou et al., 2017), and Kinetics (Kay et al., 2017), where the effectiveness of SN is demonstrated by comparing with existing normalization techniques. |
| Researcher Affiliation | Collaboration | Ping Luo1,3 Jiamin Ren2 Zhanglin Peng2 Ruimao Zhang1 Jingyu Li1 1The Chinese University of Hong Kong 2Sense Time Research 3The University of Hong Kong |
| Pseudocode | Yes | For the software without auto differentiation, we provide the backward computations of SN below. Let ˆh be the output of the SN layer represented by a 4D tensor (N, C, H, W) with index n, c, i, j. Let ˆh = γ h+β and h = h µ σ2+ϵ, where µ = wbnµbn+winµin+wlnµln, σ2 = wbnσ2 bn+winσ2 in+wlnσ2 ln, and wbn + win + wln = 1. Note that the importance weights are shared among the means and variances for clarity of notations. Suppose that each one of {µ, µbn, µin, µln, σ2, σ2 bn, σ2 in, σ2 ln} is reshaped into a vector of N C entries, which are the same as the dimension of IN s statistics. Let L be the loss function and ( L µ )n be the gradient with respect to the n-th entry of µ. |
| Open Source Code | Yes | The code of SN has been released in https://github.com/switchablenorms/. |
| Open Datasets | Yes | SN outperforms its counterparts on various challenging benchmarks, such as Image Net, COCO, City Scapes, ADE20K, and Kinetics. |
| Dataset Splits | Yes | All models in Image Net are trained on 1.2M images and evaluated on 50K validation images. |
| Hardware Specification | No | The paper mentions training on 'GPUs' and discusses 'batch sizes' in terms of '#GPUs, #samples per GPU', but does not specify any particular GPU model (e.g., NVIDIA V100, A100) or other hardware components like CPU or memory. |
| Software Dependencies | No | The paper states: 'SN can be easily implemented in existing softwares such as Py Torch and Tensor Flow.' and 'We implement it on existing detection softwares of Py Torch and Caffe2-Detectron (Girshick et al., 2018) respectively.' However, it does not provide specific version numbers for these software packages. |
| Experiment Setup | Yes | All models are trained for 100 epoches with a initial learning rate of 0.1, which is deceased by 10 after 30, 60, and 90 epoches. For different batch sizes, the initial learning rate is linearly scaled according to (Goyal et al., 2017). During training, we employ data augmentation the same as (He et al., 2016). The top-1 classification accuracy on the 224x224 center crop is reported. |