Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large-width asymptotics and training dynamics of $\alpha$-Stable ReLU neural networks

Authors: Stefano Favaro, Sandra Fortini, Stefano Peluchetti

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate numerically Theorem 2.1, we sample random neural networks according to 3 for various values of width m and stability index α. We evaluate these networks on a fine uniform grid of points in [0, 1]2. Figure 2 displays the results, which show that the function samples remain well-behaved as m grows larger.
Researcher Affiliation Collaboration Stefano Favaro EMAIL Department of Economics and Statistics University of Torino and Collegio Carlo Alberto, Sandra Fortini EMAIL Department of Decision Sciences Bocconi University, Stefano Peluchetti EMAIL Cogent Labs, Tokyo
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methods are described through mathematical formulations and proofs.
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. The OpenReview link refers to the paper's review forum, not a code base.
Open Datasets No The paper uses a 'fine uniform grid of points in [0, 1]2' for its numerical illustrations, which is a synthetic data generation method, not a publicly available dataset requiring access information.
Dataset Splits No The paper performs numerical illustrations on a 'fine uniform grid of points in [0, 1]2' but does not describe any training, test, or validation dataset splits, as it's not a machine learning experiment with conventional data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used to run its numerical illustrations or experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers used for the experiments or numerical illustrations.
Experiment Setup Yes By assuming the learning rate ηm = (log m)2/α, we show that: i) if m + then (log m)2/αHm(W(0), X; α) converges weakly to an (α/2)-Stable (almost surely) positive definite random matrix H (X, X; α); ii) and for every δ > 0 the gradient descent achieves zero training error at linear rate, for m sufficiently large, with probability 1 δ.