Stochastic Normalization
Authors: Zhi Kou, Kaichao You, Mingsheng Long, Jianmin Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical experiments show that Stoch Norm is a powerful tool to avoid over-fitting in fine-tuning with small datasets. Besides, Stoch Norm is readily pluggable in modern CNN backbones. It is complementary to other fine-tuning methods and can work together to achieve stronger regularization effect. |
| Researcher Affiliation | Academia | Zhi Kou , Kaichao You , Mingsheng Long (B), Jianmin Wang School of Software, BNRist, Research Center for Big Data, Tsinghua University, China {kz19,ykc20}@mails.tsinghua.edu.cn, {mingsheng,jimwang}@tsinghua.edu.cn |
| Pseudocode | Yes | Stoch Norm is intuitively described in Figure 1 and summarized in detail by Algorithm 1. |
| Open Source Code | Yes | The code is available at https://github.com/thuml/StochNorm. |
| Open Datasets | Yes | The evaluation is conducted on four standard datasets. CUB-200-2011 (Welinder et al., 2010)... Stanford Cars (Krause et al., 2013)... FGVC Aircraft (Maji et al., 2013)... NIH Chest X-ray (Wang et al., 2017) |
| Dataset Splits | Yes | We follow the train/validation/test partition of each dataset. For datasets without validation data, we use 20% training data for validation and use the same validation data for all methods. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper states 'Experiments are implemented based on Py Torch (Benoit et al., 2019)' but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The learning rate for the last layer is set to be 10 times of those for the fine-tuned layers because parameters in the last layer are randomly initialized. We adopt SGD with momentum of 0.9 together with the progressive training strategies in Li et al. (2018). Experiments are repeated five times to get the mean and deviation. Hyper-parameters for each method are selected on validation data. We follow the train/validation/test partition of each dataset. For datasets without validation data, we use 20% training data for validation and use the same validation data for all methods. The selection probability p = 0.5 works well for most experiments. |