reproducibilityindex.ai

Optimization and Adaptive Generalization of Three layer Neural Networks

Authors: Khashayar Gatmiry, Stefanie Jegelka, Jonathan Kelner

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	While there has been substantial recent work studying generalization of neural networks, the ability of deep networks in automating the process of feature extraction still evades a thorough mathematical understanding. As a step toward this goal, we analyze learning and generalization of a three-layer neural network with Re LU activations in a regime that goes beyond the linear approximation of the network and is hence not captured by the common Neural Tangent Kernel. We show that despite nonconvexity of the empirical loss, a variant of SGD converges in polynomially many iterations to a good solution that generalizes. In particular, our generalization bounds are adaptive: they automatically optimize over a family of kernels that includes the Neural Tangent Kernel to provide the tightest bound.
Researcher Affiliation	Academia	Khashayar Gatmiry MIT gatmiry@mit.edu Stefanie Jegelka MIT stefje@mit.edu Jonathan Kelner MIT kelner@mit.edu
Pseudocode	Yes	Algorithm 1 PSGD(Projected Stochastic Gradient Descent)
Open Source Code	No	The paper does not provide any statement about releasing code for the described methodology, nor does it include any links to source code repositories.
Open Datasets	No	The paper is theoretical and focuses on mathematical analysis rather than empirical evaluation on specific datasets. It defines training loss generally but does not mention or provide access information for any specific dataset used for training.
Dataset Splits	No	The paper is theoretical and does not conduct experiments, therefore it does not provide details about dataset splits (training, validation, or test) for reproducibility.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup or mention specific hardware components (like GPU or CPU models, or cloud computing instances) used for running experiments.
Software Dependencies	No	The paper is theoretical and focuses on mathematical analysis. It does not describe any experimental setup or list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm analysis. It does not describe an experimental setup with specific hyperparameters, model initialization, or training schedules typical of empirical studies.