Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Authors: Hanxu Zhou, Zhou Qixuan, Zhenyuan Jin, Tao Luo, Yaoyu Zhang, Zhi-Qin Xu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime.
Researcher Affiliation Academia Hanxu Zhou1, Qixuan Zhou1, Zhenyuan Jin1, Tao Luo1,2, Yaoyu Zhang1,3, Zhi-Qin John Xu1 , 1 School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC and Qing Yuan Research Institute, Shanghai Jiao Tong University 2 CMA-Shanghai, Shanghai Artificial Intelligence Laboratory 3 Shanghai Center for Brain Science and Brain-Inspired Technology
Pseudocode No The paper does not contain any sections or figures labeled 'Pseudocode' or 'Algorithm', nor are there any structured code-like blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In supplementary.
Open Datasets Yes The input dimension d is determined by the training data, i.e., d = 1 for synthetic data and d = 28 28 for MNIST.
Dataset Splits No The paper does not explicitly specify a validation dataset split; it mentions using synthetic data and MNIST but only describes the synthetic data as having 4 training points and training with full batch gradient descent.
Hardware Specification No The paper mentions usage of 'HPC of School of Mathematical Sciences and the Student Innovation Center, and the Siyuan-1 cluster supported by the Center for High Performance Computing at Shanghai Jiao Tong University', but it does not specify any particular GPU models, CPU models, or other detailed hardware specifications.
Software Dependencies No The paper does not provide any specific software dependencies, libraries, or their version numbers used in the experiments.
Experiment Setup Yes Throughout this section, we use three-layer fully-connected neural networks with size, d-m-m-dout. The input dimension d is determined by the training data, i.e., d = 1 for synthetic data and d = 28 28 for MNIST. The output dimension is dout = 1 for synthetic data and dout = 10 for MNIST. The number of hidden neurons m is specified in each experiment. All parameters are initialized by a Gaussian distribution N(0, var), where var depends on β1, β2 and β3. The total data size is n. The training method is gradient descent with full batch, learning rate lr and MSE loss.