What Can ResNet Learn Efficiently, Going Beyond Kernels?
Authors: Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove neural networks can efficiently learn a notable class of functions, including those defined by three-layer residual networks with smooth activations, without any distributional assumption. At the same time, we prove there are simple functions in this class such that with the same number of training examples, the test error obtained by neural networks can be much smaller than any kernel method, including neural tangent kernels (NTK). |
| Researcher Affiliation | Collaboration | Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu Yuanzhi Li Carnegie Mellon University yuanzhil@andrew.cmu.edu |
| Pseudocode | No | The paper describes the SGD updates in paragraph form with equations, but does not present them as a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code or a direct link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not use a specific public dataset for its experiments; it refers to sampling from 'some unknown distribution D' for theoretical analysis and mentions CIFAR-10 as an example where neural networks perform well, without using it for empirical evaluation within the paper. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental dataset splits (training, validation, test). |
| Hardware Specification | No | The paper does not specify any hardware used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We consider the vanilla SGD algorithm. Starting from W0 = 0, V0 = 0, in each iteration t = 0, 1, . . . , T 1, it receives a random sample (xt, yt) D and performs SGD updates... Table 1: Three-layer Res Net parameter choices. |