reproducibilityindex.ai

Stochastic Normalization

Authors: Zhi Kou, Kaichao You, Mingsheng Long, Jianmin Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical experiments show that Stoch Norm is a powerful tool to avoid over-ﬁtting in ﬁne-tuning with small datasets. Besides, Stoch Norm is readily pluggable in modern CNN backbones. It is complementary to other ﬁne-tuning methods and can work together to achieve stronger regularization effect.
Researcher Affiliation	Academia	Zhi Kou , Kaichao You , Mingsheng Long (B), Jianmin Wang School of Software, BNRist, Research Center for Big Data, Tsinghua University, China {kz19,ykc20}@mails.tsinghua.edu.cn, {mingsheng,jimwang}@tsinghua.edu.cn
Pseudocode	Yes	Stoch Norm is intuitively described in Figure 1 and summarized in detail by Algorithm 1.
Open Source Code	Yes	The code is available at https://github.com/thuml/StochNorm.
Open Datasets	Yes	The evaluation is conducted on four standard datasets. CUB-200-2011 (Welinder et al., 2010)... Stanford Cars (Krause et al., 2013)... FGVC Aircraft (Maji et al., 2013)... NIH Chest X-ray (Wang et al., 2017)
Dataset Splits	Yes	We follow the train/validation/test partition of each dataset. For datasets without validation data, we use 20% training data for validation and use the same validation data for all methods.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper states 'Experiments are implemented based on Py Torch (Benoit et al., 2019)' but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	The learning rate for the last layer is set to be 10 times of those for the ﬁne-tuned layers because parameters in the last layer are randomly initialized. We adopt SGD with momentum of 0.9 together with the progressive training strategies in Li et al. (2018). Experiments are repeated ﬁve times to get the mean and deviation. Hyper-parameters for each method are selected on validation data. We follow the train/validation/test partition of each dataset. For datasets without validation data, we use 20% training data for validation and use the same validation data for all methods. The selection probability p = 0.5 works well for most experiments.