Optimization and Adaptive Generalization of Three layer Neural Networks
Authors: Khashayar Gatmiry, Stefanie Jegelka, Jonathan Kelner
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | While there has been substantial recent work studying generalization of neural networks, the ability of deep networks in automating the process of feature extraction still evades a thorough mathematical understanding. As a step toward this goal, we analyze learning and generalization of a three-layer neural network with Re LU activations in a regime that goes beyond the linear approximation of the network and is hence not captured by the common Neural Tangent Kernel. We show that despite nonconvexity of the empirical loss, a variant of SGD converges in polynomially many iterations to a good solution that generalizes. In particular, our generalization bounds are adaptive: they automatically optimize over a family of kernels that includes the Neural Tangent Kernel to provide the tightest bound. |
| Researcher Affiliation | Academia | Khashayar Gatmiry MIT gatmiry@mit.edu Stefanie Jegelka MIT stefje@mit.edu Jonathan Kelner MIT kelner@mit.edu |
| Pseudocode | Yes | Algorithm 1 PSGD(Projected Stochastic Gradient Descent) |
| Open Source Code | No | The paper does not provide any statement about releasing code for the described methodology, nor does it include any links to source code repositories. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical analysis rather than empirical evaluation on specific datasets. It defines training loss generally but does not mention or provide access information for any specific dataset used for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments, therefore it does not provide details about dataset splits (training, validation, or test) for reproducibility. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or mention specific hardware components (like GPU or CPU models, or cloud computing instances) used for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical analysis. It does not describe any experimental setup or list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm analysis. It does not describe an experimental setup with specific hyperparameters, model initialization, or training schedules typical of empirical studies. |