Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Phase Diagram for Two-layer ReLU Neural Networks at Infinite-width Limit

Authors: Tao Luo, Zhi-Qin John Xu, Zheng Ma, Yaoyu Zhang

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through both experimental and theoretical approaches, we identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime, based on the relative change of input weights as the width approaches inﬁnity, which tends to 0, O(1) and + , respectively. In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay. In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations. The critical regime serves as the boundary between above two regimes, which exhibits an intermediate nonlinear behavior with the mean-ﬁeld model as a typical example. Overall, our phase diagram for the two-layer Re LU NN serves as a map for the future studies and is a ﬁrst step towards a more systematical investigation of the training behavior and the implicit regularization of NNs of diﬀerent structures.
Researcher Affiliation	Academia	Tao Luoa, EMAIL Zhi-Qin John Xua, EMAIL Zheng Maa EMAIL Yaoyu Zhanga,b, EMAIL a School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC and Qing Yuan Research Institute, Shanghai Jiao Tong University, Shanghai, 200240, China b Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, 200031, China
Pseudocode	No	The paper includes schematic diagrams (Figure 8 and Figure 9) that outline the steps of proofs, but these are not structured pseudocode or algorithm blocks for a method or procedure.
Open Source Code	Yes	. Code can be found in https://github.com/xuzhiqin1990/phasediagram_twolayer NN
Open Datasets	Yes	To experimentally distinguish the linear and nonlinear regimes, we need to estimate sup t [0,+ ) RD(θw(t)), which empirically can be approximated by RD(θ w) (θ w := θw( )) without loss of generality. Next, because we can never run experiments at m , we alternatively quantify the growth of RD(θ w) as m . By Fig. 3 (a-c), they approximately have a power-law relation. Therefore we deﬁne Sw = lim m log RD(θ w) log m , (19) which is empirically obtained by estimating the slope in the log-log plot like in Fig. 3. As shown in Fig. 3 (d), NNs with the same pair of γ and γ , but diﬀerent α, β1, and β2, have very similar Sw, which validates the eﬀectiveness of the normalized model. In the following experiments, we only show result of one combination of α, β1, and β2 for a pair of γ and γ . Then, we visualize the phase diagram by experimentally scanning Sw over the phase space. The result for the same 1-d problem as in Fig. 2 is presented in Fig. 4. In the red zone, where Sw is less than zero, RD(θ w) 0 as m , indicating a linear regime. In contrast, in the blue zone, where Sw is greater than zero, RD(θ w) as m , indicating a highly nonlinear behavior. Their boundary are experimentally identiﬁed through interpolation indicated by stars in Fig. 4, where RD(θ w) O(1). They are close to the boundary identiﬁed through the scaling analysis indicated by the auxiliary lines, justifying its criticality. Similarly, we use two-layer Re LU NNs to ﬁt MNIST data set with mean squared loss. In our experiments, the input is a 784 dimensional vector and the output is the one-dimensional label (0 9) of the input image. As shown in Fig. 5, the phase diagram obtained by the synthetic data also applies for such real high-dimensional data set.
Dataset Splits	No	For the synthetic data, the paper mentions a 'simple 1-d problem of 4 training points' but does not specify how these points were split (e.g., training, validation, test). For the MNIST dataset, it mentions using 'MNIST data set' but does not provide details about the specific training, test, or validation splits used, or reference standard splits explicitly.
Hardware Specification	No	The paper mentions 'HPC of School of Mathematical Sciences and the Student Innovation Center at Shanghai Jiao Tong University' in the acknowledgments. This refers to a general computing resource but does not provide specific hardware details such as GPU/CPU models, memory, or other specifications.
Software Dependencies	No	The paper does not explicitly state any specific software dependencies or library versions used for implementing the experiments.
Experiment Setup	Yes	The parameters are initialized by a0 k N(0, β2 1), w0 k N(0, β2 2Id). The bias term bk can be incorporated by expanding x and wk to (x , 1) and (w k, bk) . At the inﬁnite-width limit m , given β1, β2 O(1), for α m, the gradient ﬂow of NN can be approximated by a linear dynamics of neural tangent kernel (NTK) (Jacot et al., 2018; Arora et al., 2019; Zhang et al., 2020), whereas for α m, gradient ﬂow of NN exhibits highly nonlinear mean-ﬁeld dynamics (Mei et al., 2018; Rotskoﬀand Vanden Eijnden, 2018; Chizat and Bach, 2018; Sirignano and Spiliopoulos, 2020). ... The result for the same 1-d problem as in Fig. 2 is presented in Fig. 4. In the red zone, where Sw is less than zero, RD(θ w) 0 as m , indicating a linear regime. In contrast, in the blue zone, where Sw is greater than zero, RD(θ w) as m , indicating a highly nonlinear behavior. Their boundary are experimentally identiﬁed through interpolation indicated by stars in Fig. 4, where RD(θ w) O(1). They are close to the boundary identiﬁed through the scaling analysis indicated by the auxiliary lines, justifying its criticality. Similarly, we use two-layer Re LU NNs to ﬁt MNIST data set with mean squared loss. In our experiments, the input is a 784 dimensional vector and the output is the one-dimensional label (0 9) of the input image. As shown in Fig. 5, the phase diagram obtained by the synthetic data also applies for such real high-dimensional data set.