reproducibilityindex.ai

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

Authors: Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets. ... In this section, we provide empirical validation for our theory. Specifically, we compare the performance of two training regimes ... To evaluate our theory in terms of training error, we conduct experiments on synthetic dataset (shown in Section 5.1) and random-labeled CIFAR-10 [31] (shown in appendix H.6). We further observe the strong generalization power of Algorithm 2, even though it is not yet revealed in our theory. Our training regime brings higher or competitive test accuracy on (Restricted) Image Net [58] (shown in Section 5.2), MNIST [33], CIFAR-10, CIFAR-100 [31] (shown in Appendix H).
Researcher Affiliation	Academia	Jiawei Zhang Shenzhen Research Institute of Big Data The Chinese University of Hong Kong, Shenzhen, China jiaweizhang2@link.cuhk.edu.cn Yushun Zhang Shenzhen Research Institute of Big Data The Chinese University of Hong Kong, Shenzhen, China yushunzhang@link.cuhk.edu.cn Mingyi Hong University of Minnesota Twin Citie MN, USA mhong@umn.edu Ruoyu Sun University of Illinois at Urbana-Champaign IL, USA ruoyus@illinois.edu Zhi-Quan Luo Shenzhen Research Institute of Big Data The Chinese University of Hong Kong, Shenzhen, China luozq@cuhk.edu.cn
Pseudocode	Yes	Algorithm 1 The mirrored Le Cun s initialization ... We outline the proposed training regime in Algorithm 2 in Appendix H.1.
Open Source Code	No	The paper mentions PyTorch implementation but does not provide an explicit statement about releasing the source code or a link to a repository.
Open Datasets	Yes	random-labeled CIFAR-10 [31] ... (Restricted) Image Net [58] ... MNIST [33], CIFAR-10, CIFAR-100 [31]
Dataset Splits	Yes	For MNIST and CIFAR experiments, we use the standard training and test splits. ... For R-ImageNet experiments, we use the same training/validation/test splits from [68].
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch implementation' but does not specify its version or other software dependencies with version numbers.
Experiment Setup	Yes	Detailed experimental setup are explained in Appendix H.2. ... in each block in Figure 4, a grid search of step-size is performed to ensure the convergence of algorithms. ... we observe that v never touches the boundary of Bζ,κ(v) when κ = 1, ζ = 0.001. ... (Appendix H.2 - Hyperparameters) For the synthetic dataset, all models are trained for 200 epochs using SGD with batch size 512, learning rate 0.01. For MNIST, CIFAR-10, CIFAR-100, we use batch size 1024, initial learning rate 0.1, trained for 50 epochs.