reproducibilityindex.ai

Towards Theoretically Inspired Neural Initialization Optimization

Authors: Yibo Yang, Hong Wang, Haobo Yuan, Zhouchen Lin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that for a variety of deep architectures including Res Net [19], Dense Net [21], and Wide Res Net [56], our method achieves better classiﬁcation results on CIFAR-10/100 [27] than prior heuristic [18] and learning-based [8, 60] initialization methods. We can also initialize Res Net-50 [19] on Image Net [9] for better performance. Moreover, our method is able to help the recently proposed Swin-Transformer [32] achieve stable training and competitive results on Image Net even without warmup [17], which is crucial for the successful training of Transformer architectures [31, 52].
Researcher Affiliation	Collaboration	Yibo Yang1, Hong Wang2, Haobo Yuan3, Zhouchen Lin2,4,5 1JD Explore Academy, Beijing, China 2Key Lab. of Machine Perception (Mo E), School of Intelligence Science and Technology, Peking University 3Institute of Artiﬁcial Intelligence and School of Computer Science, Wuhan University 4Institute for Artiﬁcial Intelligence, Peking University 5Pazhou Laboratory, Guangzhou, China
Pseudocode	Yes	Algorithm 1 Grad Cosine (GC) and gradient norm (GN) ... Algorithm 2 Batch Grad Cosine (B-GC) and batch gradient norm (B-GN) ... Algorithm 3 Neural Initialization Optimization
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplementary material.
Open Datasets	Yes	We validate our method on three widely used datasets including CIFAR10/100 [27] and Image Net [9]. ... [27] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009. ... [9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248 255, 2009.
Dataset Splits	No	The paper mentions training models and evaluating on a test set, and refers to 'Detailed settings for different architectures and datasets are described in Appendix B.' However, it does not explicitly state the specific training/validation/test splits (e.g., percentages or counts) within the provided main text or explicitly mention a 'validation set' split.
Hardware Specification	Yes	Train time is tested on an NVIDIA A100 server with a batchsize of 256 among 8 GPUs.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup	Yes	After initialization, we train these models for 500 epochs with the same training setting. Each model is trained four times with different seeds. ... using Res Net-50 for 100 epochs with a batchsize of 256. ... Detailed training and initialization settings are described in Appendix B.