Analysis of one-hidden-layer neural networks via the resolvent method

Authors: Vanessa Piccolo, Dominik Schröder

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that the Stieltjes transform of the limiting spectral distribution approximately satisfies a quartic self-consistent equation, which is exactly the equation obtained by Pennington and Worah [22] and Benigni and Péché [6] with the moment method. We extend the previous results to the case of additive bias Y = f(WX + B) with B being an independent rank-one Gaussian random matrix, closer modelling the neural network infrastructures encountered in practice. Our key finding is that in the case of additive bias it is impossible to choose an activation function preserving the layer-to-layer singular value distribution, in sharp contrast to the bias-free case where a simple integral constraint is sufficient to achieve isospectrality. The numerical experiments were conducted for the parameters n1 = 3000, ϕ = σx = σw = 1, ψ = 5 (left) or ψ = 2 (right), and σb = 0 (top) or σb = 0.25 (bottom). In Fig. 2 we test this result experimentally and choose the activation function f(x) = c1|x| c2 with c1, c2 such that (2) is satisfied and θ1(f) = 1. We find that in the bias-free case (left), irrespective of the network depth, the eigenvalues of the covariance matrix Y (l)(Y (l)) converge to their theoretical limit from Theorem 2.1, exactly as in [22, Fig. 1]2.
Researcher Affiliation Academia Vanessa Piccolo ETH Zurich (current affiliation: ENS Lyon) vanessa.piccolo@ens-lyon.fr Dominik Schröder Institute for Theoretical Studies ETH Zurich dschroeder@ethz.ch
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper specifies the use of 'random data matrix X' and 'random weight matrix W' with 'i.i.d. random variables' but does not mention or provide access information for any publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce data partitioning.
Hardware Specification No The paper mentions that 'numerical experiments were conducted' but provides no specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running these experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments.
Experiment Setup Yes The numerical experiments were conducted for the parameters n1 = 3000, ϕ = σx = σw = 1, ψ = 5 (left) or ψ = 2 (right), and σb = 0 (top) or σb = 0.25 (bottom).