Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Commutative Scaling of Width and Depth in Deep Neural Networks
Authors: Soufiane Hayou
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations are provided to illustrate the theoretical results. In this section, we validate our theoretical results with simulations on large width and depth residual neural networks of the form Eq. (5) with different choices of the sequence α. |
| Researcher Affiliation | Academia | Soufiane Hayou EMAIL Simons Institute UC Berkeley |
| Pseudocode | No | The paper describes the methods using mathematical formulations and textual descriptions, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide any links to code repositories. |
| Open Datasets | No | In Fig. 1, we compare the empirical covariance ql,n with the theoretical prediction qt from Theorem 2 for n {23, 28, 214} and L {21, 23, 28}. We chose maximum depth to be much smaller than maximum width to take into account the difference in the width and depth convergence rates: n 1/2 versus L 1 in this case. The empirical L2 error between q L,n and q1 (from Theorem 2) is also reported. As the width increases, we observe an excellent match with the theory. The role of the depth is less noticeable, but for instance, with width n = 214, we can see that the L2 error is smaller with depth L = 256 as compared to depth L = 2. The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg (1968)) for t [0, 1] with a discretization step t =1e-6. The blue curve represents the average covariance ql,n(a, b) for Res Net Eq. (5) with n {23, 28, 214}, L {21, 23, 28}, d = 30, and a and b are sampled randomly from N(0, Id) and normalized to have a = b = 1. The average is calculated based on N = 100 simulations. |
| Dataset Splits | No | The paper states that inputs 'a and b are sampled randomly from N(0, Id) and normalized to have a = b = 1' and 'The average is calculated based on N = 100 simulations'. Since the data is generated randomly for each simulation, there are no explicit dataset splits (training, validation, test) mentioned. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running its simulations or experiments. |
| Software Dependencies | No | The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg (1968)) for t [0, 1] with a discretization step t =1e-6. This mentions a method but does not specify the software package or version used to implement it or any other software dependencies with version numbers. |
| Experiment Setup | Yes | In Fig. 1, we compare the empirical covariance ql,n with the theoretical prediction qt from Theorem 2 for n {23, 28, 214} and L {21, 23, 28}. We chose maximum depth to be much smaller than maximum width to take into account the difference in the width and depth convergence rates: n {23, 28, 214} and L {21, 23, 28}, d = 30, and a and b are sampled randomly from N(0, Id) and normalized to have a = b = 1. The average is calculated based on N = 100 simulations. The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg (1968)) for t [0, 1] with a discretization step t =1e-6. |