Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing
Authors: Peng Ye, Shengji Tang, Baopu Li, Tao Chen, Wanli Ouyang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive empirical results and theoretical analyses verify that stimulative training can well handle the loafing problem, and improve the performance of a residual network by improving the performance of its sub-networks. |
| Researcher Affiliation | Collaboration | Peng Ye1 , Shengji Tang1 , Baopu Li2 , Tao Chen1 , Wanli Ouyang3 1School of Information Science and Technology, Fudan University, 2Oracle Health and AI, USA, 3The University of Sydney, Sense Time Computer Vision Group, Australia, and Shanghai AI Lab |
| Pseudocode | Yes | The pseudo code is shown in Appendix C.1. |
| Open Source Code | Yes | The code is available at https://github.com/Sunshine-Ye/NIPS22-ST. |
| Open Datasets | Yes | CIFAR[34] is a classical image classification dataset consisting of 50,000 training images and 10,000 testing images. It includes CIFAR-100 in 100 categories and CIFAR-10 in 10 categories. Image Net[36] dataset containing 1.2 million training images and 50,000 validation images from 1,000 categories. |
| Dataset Splits | Yes | CIFAR[34] is a classical image classification dataset consisting of 50,000 training images and 10,000 testing images. It includes CIFAR-100 in 100 categories and CIFAR-10 in 10 categories. Image Net implementation details. We implement our method on large-scale Image Net[36] dataset containing 1.2 million training images and 50,000 validation images from 1,000 categories. |
| Hardware Specification | No | The paper mentions 'computation cost' but does not specify the type of GPUs, CPUs, or other hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'SGD optimizer' but does not specify particular software libraries or frameworks with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x, or a specific Python version). |
| Experiment Setup | Yes | For Mobile Net V3 and Res Net50, the data augmentations follow [35], we use SGD optimizer and train the model for 500 epochs with a batch size of 64. The initial learning rate is 0.05 with cosine decay schedule. The weight decay is 3 10 5 and momentum is 0.9. We utilize SGD optimizer to train the model for 100 epochs with a batch size of 512, and the learning rate is 0.2 with cosine decay schedule. The weight decay is 1 10 4 and momentum is 0.9. |