Analysis of one-hidden-layer neural networks via the resolvent method
Authors: Vanessa Piccolo, Dominik Schröder
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that the Stieltjes transform of the limiting spectral distribution approximately satisfies a quartic self-consistent equation, which is exactly the equation obtained by Pennington and Worah [22] and Benigni and Péché [6] with the moment method. We extend the previous results to the case of additive bias Y = f(WX + B) with B being an independent rank-one Gaussian random matrix, closer modelling the neural network infrastructures encountered in practice. Our key finding is that in the case of additive bias it is impossible to choose an activation function preserving the layer-to-layer singular value distribution, in sharp contrast to the bias-free case where a simple integral constraint is sufficient to achieve isospectrality. The numerical experiments were conducted for the parameters n1 = 3000, ϕ = σx = σw = 1, ψ = 5 (left) or ψ = 2 (right), and σb = 0 (top) or σb = 0.25 (bottom). In Fig. 2 we test this result experimentally and choose the activation function f(x) = c1|x| c2 with c1, c2 such that (2) is satisfied and θ1(f) = 1. We find that in the bias-free case (left), irrespective of the network depth, the eigenvalues of the covariance matrix Y (l)(Y (l)) converge to their theoretical limit from Theorem 2.1, exactly as in [22, Fig. 1]2. |
| Researcher Affiliation | Academia | Vanessa Piccolo ETH Zurich (current affiliation: ENS Lyon) vanessa.piccolo@ens-lyon.fr Dominik Schröder Institute for Theoretical Studies ETH Zurich dschroeder@ethz.ch |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | No | The paper specifies the use of 'random data matrix X' and 'random weight matrix W' with 'i.i.d. random variables' but does not mention or provide access information for any publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce data partitioning. |
| Hardware Specification | No | The paper mentions that 'numerical experiments were conducted' but provides no specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running these experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | The numerical experiments were conducted for the parameters n1 = 3000, ϕ = σx = σw = 1, ψ = 5 (left) or ψ = 2 (right), and σb = 0 (top) or σb = 0.25 (bottom). |