Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Commutative Scaling of Width and Depth in Deep Neural Networks

Authors: Soufiane Hayou

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations are provided to illustrate the theoretical results. In this section, we validate our theoretical results with simulations on large width and depth residual neural networks of the form Eq. (5) with diﬀerent choices of the sequence α.
Researcher Affiliation	Academia	Souﬁane Hayou EMAIL Simons Institute UC Berkeley
Pseudocode	No	The paper describes the methods using mathematical formulations and textual descriptions, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide any links to code repositories.
Open Datasets	No	In Fig. 1, we compare the empirical covariance ql,n with the theoretical prediction qt from Theorem 2 for n {23, 28, 214} and L {21, 23, 28}. We chose maximum depth to be much smaller than maximum width to take into account the diﬀerence in the width and depth convergence rates: n 1/2 versus L 1 in this case. The empirical L2 error between q L,n and q1 (from Theorem 2) is also reported. As the width increases, we observe an excellent match with the theory. The role of the depth is less noticeable, but for instance, with width n = 214, we can see that the L2 error is smaller with depth L = 256 as compared to depth L = 2. The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg (1968)) for t [0, 1] with a discretization step t =1e-6. The blue curve represents the average covariance ql,n(a, b) for Res Net Eq. (5) with n {23, 28, 214}, L {21, 23, 28}, d = 30, and a and b are sampled randomly from N(0, Id) and normalized to have a = b = 1. The average is calculated based on N = 100 simulations.
Dataset Splits	No	The paper states that inputs 'a and b are sampled randomly from N(0, Id) and normalized to have a = b = 1' and 'The average is calculated based on N = 100 simulations'. Since the data is generated randomly for each simulation, there are no explicit dataset splits (training, validation, test) mentioned.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running its simulations or experiments.
Software Dependencies	No	The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg (1968)) for t [0, 1] with a discretization step t =1e-6. This mentions a method but does not specify the software package or version used to implement it or any other software dependencies with version numbers.
Experiment Setup	Yes	In Fig. 1, we compare the empirical covariance ql,n with the theoretical prediction qt from Theorem 2 for n {23, 28, 214} and L {21, 23, 28}. We chose maximum depth to be much smaller than maximum width to take into account the diﬀerence in the width and depth convergence rates: n {23, 28, 214} and L {21, 23, 28}, d = 30, and a and b are sampled randomly from N(0, Id) and normalized to have a = b = 1. The average is calculated based on N = 100 simulations. The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg (1968)) for t [0, 1] with a discretization step t =1e-6.