When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
Authors: Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets. ... In this section, we provide empirical validation for our theory. Specifically, we compare the performance of two training regimes ... To evaluate our theory in terms of training error, we conduct experiments on synthetic dataset (shown in Section 5.1) and random-labeled CIFAR-10 [31] (shown in appendix H.6). We further observe the strong generalization power of Algorithm 2, even though it is not yet revealed in our theory. Our training regime brings higher or competitive test accuracy on (Restricted) Image Net [58] (shown in Section 5.2), MNIST [33], CIFAR-10, CIFAR-100 [31] (shown in Appendix H). |
| Researcher Affiliation | Academia | Jiawei Zhang Shenzhen Research Institute of Big Data The Chinese University of Hong Kong, Shenzhen, China jiaweizhang2@link.cuhk.edu.cn Yushun Zhang Shenzhen Research Institute of Big Data The Chinese University of Hong Kong, Shenzhen, China yushunzhang@link.cuhk.edu.cn Mingyi Hong University of Minnesota Twin Citie MN, USA mhong@umn.edu Ruoyu Sun University of Illinois at Urbana-Champaign IL, USA ruoyus@illinois.edu Zhi-Quan Luo Shenzhen Research Institute of Big Data The Chinese University of Hong Kong, Shenzhen, China luozq@cuhk.edu.cn |
| Pseudocode | Yes | Algorithm 1 The mirrored Le Cun s initialization ... We outline the proposed training regime in Algorithm 2 in Appendix H.1. |
| Open Source Code | No | The paper mentions PyTorch implementation but does not provide an explicit statement about releasing the source code or a link to a repository. |
| Open Datasets | Yes | random-labeled CIFAR-10 [31] ... (Restricted) Image Net [58] ... MNIST [33], CIFAR-10, CIFAR-100 [31] |
| Dataset Splits | Yes | For MNIST and CIFAR experiments, we use the standard training and test splits. ... For R-ImageNet experiments, we use the same training/validation/test splits from [68]. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch implementation' but does not specify its version or other software dependencies with version numbers. |
| Experiment Setup | Yes | Detailed experimental setup are explained in Appendix H.2. ... in each block in Figure 4, a grid search of step-size is performed to ensure the convergence of algorithms. ... we observe that v never touches the boundary of Bζ,κ(v) when κ = 1, ζ = 0.001. ... (Appendix H.2 - Hyperparameters) For the synthetic dataset, all models are trained for 200 epochs using SGD with batch size 512, learning rate 0.01. For MNIST, CIFAR-10, CIFAR-100, we use batch size 1024, initial learning rate 0.1, trained for 50 epochs. |