Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

When Are Bias-Free ReLU Networks Effectively Linear Networks?

Authors: Yedi Zhang, Andrew M Saxe, Peter E. Latham

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate Theorem 8 and the plausibility of Assumption 7 with numerical simulations in Figure 3. In Figure 3b, the initialization is small random Gaussian weights and thus does not satisfy Assumption 7, yet Theorem 8 holds with small errors (less than 0.3%). Furthermore, we provide theoretical proof that Theorem 8 holds with L2 regularization and empirical evidence that some of Theorem 8 hold with large initialization and a moderately large learning rate in Appendices C.4 to C.6.
Researcher Affiliation	Academia	Yedi Zhang EMAIL Gatsby Computational Neuroscience Unit University College London Andrew Saxe EMAIL Gatsby Computational Neuroscience Unit & Sainsbury Wellcome Centre University College London Peter E. Latham EMAIL Gatsby Computational Neuroscience Unit University College London
Pseudocode	No	The paper provides mathematical derivations, equations, and proofs, but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or a direct link to any source code repository for the methodology described. It only includes a link to its Open Review page.
Open Datasets	Yes	The input is 20-dimensional, x R20. We sample 1000 i.i.d. vectors xn N(0, I) and include both xn and xn in the dataset, resulting in 2000 data points. The output is generated as y = w x + sin 4w x where elements of w are randomly sampled from a uniform distribution U[ 0.5, 0.5]. Figures 4 and 8. We use the same hyperparameters as Boursier et al. (2022). The network width is 60. The initialization scale winit = 10 6. The learning rate is 0.001 for square loss and 0.004 for logistic loss. The orthogonal input dataset contains two data points, i.e., [ 0.5, 1], [2, 1]. The XOR input dataset contains four data points, i.e., [0, 1], [2, 0], [0, 3], [ 4, 0].
Dataset Splits	No	The paper describes generating synthetic datasets for its experiments, specifying the number of data points or the actual points used (e.g., 2000 data points for Figure 3, two data points for orthogonal input, four for XOR). However, it does not explicitly mention or specify any training, validation, or test splits for these datasets.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide any specific software dependencies, libraries, or tools with version numbers used for the experiments.
Experiment Setup	Yes	Figure 1. The networks have width 100. The initialization scale winit = 10 2. The learning rate is 0.2. The two-layer networks are trained 10000 epochs. The three-layer networks are trained 80000 epochs. Figure 3. The networks have width 500. The initialization scale is winit = 10 8. The learning rate is 0.004. Figures 4 and 8. We use the same hyperparameters as Boursier et al. (2022). The network width is 60. The initialization scale winit = 10 6. The learning rate is 0.001 for square loss and 0.004 for logistic loss. Figure 5. The networks have width 100. The initialization scale winit = 10 2. The learning rate is 0.1. The networks are trained 20000 epochs. Figure 6. The network width is 100. The initialization scale winit = 10 3. The learning rate is 0.025.