Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Large-width asymptotics and training dynamics of $\alpha$-Stable ReLU neural networks
Authors: Stefano Favaro, Sandra Fortini, Stefano Peluchetti
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate numerically Theorem 2.1, we sample random neural networks according to 3 for various values of width m and stability index α. We evaluate these networks on a fine uniform grid of points in [0, 1]2. Figure 2 displays the results, which show that the function samples remain well-behaved as m grows larger. |
| Researcher Affiliation | Collaboration | Stefano Favaro EMAIL Department of Economics and Statistics University of Torino and Collegio Carlo Alberto, Sandra Fortini EMAIL Department of Decision Sciences Bocconi University, Stefano Peluchetti EMAIL Cogent Labs, Tokyo |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methods are described through mathematical formulations and proofs. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. The OpenReview link refers to the paper's review forum, not a code base. |
| Open Datasets | No | The paper uses a 'fine uniform grid of points in [0, 1]2' for its numerical illustrations, which is a synthetic data generation method, not a publicly available dataset requiring access information. |
| Dataset Splits | No | The paper performs numerical illustrations on a 'fine uniform grid of points in [0, 1]2' but does not describe any training, test, or validation dataset splits, as it's not a machine learning experiment with conventional data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its numerical illustrations or experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used for the experiments or numerical illustrations. |
| Experiment Setup | Yes | By assuming the learning rate ηm = (log m)2/α, we show that: i) if m + then (log m)2/αHm(W(0), X; α) converges weakly to an (α/2)-Stable (almost surely) positive definite random matrix H (X, X; α); ii) and for every δ > 0 the gradient descent achieves zero training error at linear rate, for m sufficiently large, with probability 1 δ. |