Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Phase Diagram for Two-layer ReLU Neural Networks at Infinite-width Limit
Authors: Tao Luo, Zhi-Qin John Xu, Zheng Ma, Yaoyu Zhang
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through both experimental and theoretical approaches, we identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime, based on the relative change of input weights as the width approaches infinity, which tends to 0, O(1) and + , respectively. In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay. In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations. The critical regime serves as the boundary between above two regimes, which exhibits an intermediate nonlinear behavior with the mean-field model as a typical example. Overall, our phase diagram for the two-layer Re LU NN serves as a map for the future studies and is a first step towards a more systematical investigation of the training behavior and the implicit regularization of NNs of different structures. |
| Researcher Affiliation | Academia | Tao Luoa, EMAIL Zhi-Qin John Xua, EMAIL Zheng Maa EMAIL Yaoyu Zhanga,b, EMAIL a School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC and Qing Yuan Research Institute, Shanghai Jiao Tong University, Shanghai, 200240, China b Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, 200031, China |
| Pseudocode | No | The paper includes schematic diagrams (Figure 8 and Figure 9) that outline the steps of proofs, but these are not structured pseudocode or algorithm blocks for a method or procedure. |
| Open Source Code | Yes | . Code can be found in https://github.com/xuzhiqin1990/phasediagram_twolayer NN |
| Open Datasets | Yes | To experimentally distinguish the linear and nonlinear regimes, we need to estimate sup t [0,+ ) RD(θw(t)), which empirically can be approximated by RD(θ w) (θ w := θw( )) without loss of generality. Next, because we can never run experiments at m , we alternatively quantify the growth of RD(θ w) as m . By Fig. 3 (a-c), they approximately have a power-law relation. Therefore we define Sw = lim m log RD(θ w) log m , (19) which is empirically obtained by estimating the slope in the log-log plot like in Fig. 3. As shown in Fig. 3 (d), NNs with the same pair of γ and γ , but different α, β1, and β2, have very similar Sw, which validates the effectiveness of the normalized model. In the following experiments, we only show result of one combination of α, β1, and β2 for a pair of γ and γ . Then, we visualize the phase diagram by experimentally scanning Sw over the phase space. The result for the same 1-d problem as in Fig. 2 is presented in Fig. 4. In the red zone, where Sw is less than zero, RD(θ w) 0 as m , indicating a linear regime. In contrast, in the blue zone, where Sw is greater than zero, RD(θ w) as m , indicating a highly nonlinear behavior. Their boundary are experimentally identified through interpolation indicated by stars in Fig. 4, where RD(θ w) O(1). They are close to the boundary identified through the scaling analysis indicated by the auxiliary lines, justifying its criticality. Similarly, we use two-layer Re LU NNs to fit MNIST data set with mean squared loss. In our experiments, the input is a 784 dimensional vector and the output is the one-dimensional label (0 9) of the input image. As shown in Fig. 5, the phase diagram obtained by the synthetic data also applies for such real high-dimensional data set. |
| Dataset Splits | No | For the synthetic data, the paper mentions a 'simple 1-d problem of 4 training points' but does not specify how these points were split (e.g., training, validation, test). For the MNIST dataset, it mentions using 'MNIST data set' but does not provide details about the specific training, test, or validation splits used, or reference standard splits explicitly. |
| Hardware Specification | No | The paper mentions 'HPC of School of Mathematical Sciences and the Student Innovation Center at Shanghai Jiao Tong University' in the acknowledgments. This refers to a general computing resource but does not provide specific hardware details such as GPU/CPU models, memory, or other specifications. |
| Software Dependencies | No | The paper does not explicitly state any specific software dependencies or library versions used for implementing the experiments. |
| Experiment Setup | Yes | The parameters are initialized by a0 k N(0, β2 1), w0 k N(0, β2 2Id). The bias term bk can be incorporated by expanding x and wk to (x , 1) and (w k, bk) . At the infinite-width limit m , given β1, β2 O(1), for α m, the gradient flow of NN can be approximated by a linear dynamics of neural tangent kernel (NTK) (Jacot et al., 2018; Arora et al., 2019; Zhang et al., 2020), whereas for α m, gradient flow of NN exhibits highly nonlinear mean-field dynamics (Mei et al., 2018; Rotskoffand Vanden Eijnden, 2018; Chizat and Bach, 2018; Sirignano and Spiliopoulos, 2020). ... The result for the same 1-d problem as in Fig. 2 is presented in Fig. 4. In the red zone, where Sw is less than zero, RD(θ w) 0 as m , indicating a linear regime. In contrast, in the blue zone, where Sw is greater than zero, RD(θ w) as m , indicating a highly nonlinear behavior. Their boundary are experimentally identified through interpolation indicated by stars in Fig. 4, where RD(θ w) O(1). They are close to the boundary identified through the scaling analysis indicated by the auxiliary lines, justifying its criticality. Similarly, we use two-layer Re LU NNs to fit MNIST data set with mean squared loss. In our experiments, the input is a 784 dimensional vector and the output is the one-dimensional label (0 9) of the input image. As shown in Fig. 5, the phase diagram obtained by the synthetic data also applies for such real high-dimensional data set. |