Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bayes-optimal Learning of Deep Random Networks of Extensive-width
Authors: Hugo Cui, Florent Krzakala, Lenka Zdeborova
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further show numerically that when the number of samples grows faster than the dimension, ridge & kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples. We provide a numerical exploration of the regime where the number of samples n tends to infinity faster than linearly with the input dimension d... Fig. 1 shows the Bayes MSE, eq. (12)... This is contrasted to the MSE achieved by an expressive neural network (NN)... Fig. 4 contrasts the MSE of an Adam-optimized neural network, optimally regularized ridge regression, and optimally regularized arcosine kernel regression... |
| Researcher Affiliation | Academia | 1Statistical Physics Of Computation lab, Institute of Physics, Ecole Polytechnique F ed erale de Lausanne, 1015 Lausanne, Switzerland 2Information Learning and Physics lab, Institute of Electrical Engineering, Ecole Polytechnique F ed erale de Lausanne, 1015 Lausanne, Switzerland. |
| Pseudocode | No | The paper contains no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | A repository with the code employed in the present work can be found here. |
| Open Datasets | No | We consider the problem of learning from a train set D = {xµ, yµ}n µ=1, with n independently sampled Gaussian covariates xµ Rd N(0, Σ). The paper uses synthetically generated data rather than a publicly available dataset with a specific name or access information. |
| Dataset Splits | No | The paper does not explicitly describe training/test/validation splits or cross-validation setup. It mentions 'train set' and 'test sample' but no further breakdown or methodology for splitting. |
| Hardware Specification | No | The paper performs numerical simulations but does not provide any specific hardware details such as CPU or GPU models, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions 'full-batch gradient descent' and 'Adam' optimizer but does not specify any software libraries or their version numbers (e.g., PyTorch, TensorFlow, scikit-learn). |
| Experiment Setup | Yes | Green dots represent simulations for a one (top) and two (bottom) hidden layers neural network of width 1500, optimized with full-batch GD, learning rate η = 8.10 3 and weight decay λ = 0.1. Green dots show the test error of a three layers fully connected network trained end-to-end with full-batch Adam, learning rate 0.003 and weight decay 0.01, after 2000 epochs. Purple dots indicate the MSE of a 2 layers fully connected neural network of width k = 30 trained end-to-end using Adam (purple), batch size n/3 and learning rate η = 3.10 3, over 2000 epochs. |