What Can ResNet Learn Efficiently, Going Beyond Kernels?

Authors: Zeyuan Allen-Zhu, Yuanzhi Li

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove neural networks can efficiently learn a notable class of functions, including those defined by three-layer residual networks with smooth activations, without any distributional assumption. At the same time, we prove there are simple functions in this class such that with the same number of training examples, the test error obtained by neural networks can be much smaller than any kernel method, including neural tangent kernels (NTK).
Researcher Affiliation Collaboration Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu Yuanzhi Li Carnegie Mellon University yuanzhil@andrew.cmu.edu
Pseudocode No The paper describes the SGD updates in paragraph form with equations, but does not present them as a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not include an explicit statement about releasing source code or a direct link to a code repository for the described methodology.
Open Datasets No The paper is theoretical and does not use a specific public dataset for its experiments; it refers to sampling from 'some unknown distribution D' for theoretical analysis and mentions CIFAR-10 as an example where neural networks perform well, without using it for empirical evaluation within the paper.
Dataset Splits No The paper is theoretical and does not describe experimental dataset splits (training, validation, test).
Hardware Specification No The paper does not specify any hardware used for experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We consider the vanilla SGD algorithm. Starting from W0 = 0, V0 = 0, in each iteration t = 0, 1, . . . , T 1, it receives a random sample (xt, yt) D and performs SGD updates... Table 1: Three-layer Res Net parameter choices.