Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

Authors: Andrea Montanari, Pierfrancesco Urbani

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We conduct a theoretical analysis that is described by the abstract and that answers the questions detailed in the introduction. We study the learning dynamics of large two-layer neural networks via dynamical mean field theory, a well established technique of nonequilibrium statistical physics. Our paper is theoretical in nature and simulations are fairly standard and only play a support role.
Researcher Affiliation Academia Andrea Montanari Department of Statistics and Department of Mathematics, Stanford University Pierfrancesco Urbani Université Paris-Saclay, CNRS, CEA, Institut de Physique Théorique, 91191, Gif-Sur-Yvette, France
Pseudocode No The paper describes mathematical derivations and analytical solutions, and includes code snippets for simulation setup in Appendix I, but does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code No 5. Open access to data and code Answer: [NA] Justification: Our paper is theoretical in nature and simulations are fairly standard and only play a support role.
Open Datasets No We study the dynamics of model (1.1) under the simplest data distribution in which genuine non-linear learning is required to efficiently learn a good prediction rule, the so called k-index model. Namely, we assume xi N(0, Id) and yi that depends on a low-dimensional projection U Txi: yi = φ(U Txi) + εi , εi N(0, τ 2) .
Dataset Splits No Training data comprises n points in d dimensions distributed according to a single index model. We assume n, m, d all large with n/md = α (here α = 0.3). We generate data according to the pure noise model yi = εi (Fig. 2), yi = φ(w T xi) + εi (Fig. 4), i n. For simulations in Figures 2, and 4 we use batch size b = 100 and step size η = 0.1. Each symbol reports the average of Nsim = 10 simulations.
Hardware Specification No 8. Experiments compute resources Answer: [NA] . Justification: There are no extensive or complex experiments we have performed. The paper is theoretical in nature and aims at understanding simple yet paradigmatic models.
Software Dependencies No class Net(nn.Module): def __init__(self , a, m, d): super ().__init__ () self.m = m self.lin1 = nn.Linear(d,m,bias=False) self.lin1.weight.data = (1/np.sqrt(d))*torch.randn ((m,d)) self.lin2 = nn.Linear(m,1,bias=False) self.lin2.weight.data [0 ,:] = a self.act = Myact () self.project () def forward(self , x ): x1 = self.act(self.lin1(x)) return self.lin2(x1)/self.m def project(self , epsilon): row_norms = torch.norm(self.lin1.weight.data , dim=1, keepdim= True) row_norms = torch.clamp(row_norms , min=epsilon) self.lin1.weight.data = self.lin1.weight.data/row_norms optimizer = optim.SGD(net.parameters (), lr=lr , momentum =0., weight_decay =0.) lambda_step = lambda epoch: 1 scheduler = torch.optim.lr_scheduler.Lambda LR(optimizer , lr_lambda= lambda_step) In the simulations of Figures 2, and 4 we use batch size b = 100 and step size η = 0.1.
Experiment Setup Yes In the simulations of Figures 2, and 4 we use batch size b = 100 and step size η = 0.1. Each symbol reports the average of Nsim = 10 simulations. In Figure 2 in the main text we initialize a(0) = 1, and let a(t) evolve with GF alongside the first layer weigths. We observe that the theory describes well the empirical results, despite the Gaussian We set h(q) = βφ(q) = (9/10)q + q3/6, τ = 0.3 and α = 0.3, under mean field initialization. Here we use mean field initialization, h(z) = (9/10)z + (1/6)z3, α = 0.4 and τ = 0.6.