Are deep ResNets provably better than linear predictors?

Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The first example shows that there exists a family of datasets on which the squared error loss attained by a fully-connected neural network is at best the linear least squares model, whereas a Res Net attains a strictly better loss than the linear model. This highlights that the guarantee on the risk value of local minima is indeed special to residual networks. ... Consider the following dataset with six data points, where ρ > 0 is a fixed constant: X = [0 1 2 3 4 5] , Y = [ ρ 1 ρ 2 + ρ 3 ρ 4 + ρ 5 + ρ] . ... Using the optimal w and c, a straightforward calculation gives R2(θ2) = ρ2(12ρ2+82ρ+215) 21ρ2+156ρ+420 , and it is strictly smaller than 8ρ2/15 on ρ (0, p 3.2 Representations by residual block outputs do not improve monotonically Consider a dataset X = [1 2.5 3] and Y = [1 3 2]
Researcher Affiliation Academia Chulhee Yun MIT Cambridge, MA 02139 chulheey@mit.edu Suvrit Sra MIT Cambridge, MA 02139 suvrit@mit.edu Ali Jadbabaie MIT Cambridge, MA 02139 jadbabai@mit.edu
Pseudocode No The paper describes mathematical formulations and network architectures but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any links to source code or explicitly state that code for the methodology is being released.
Open Datasets No The paper defines small, synthetic datasets for its motivating examples (e.g., 'Consider the following dataset with six data points...', 'Consider a dataset X = [1 2.5 3] and Y = [1 3 2]'), but these are custom-defined for illustrative purposes within the paper and are not publicly available datasets in the sense of being accessible via a link, DOI, or being an established benchmark.
Dataset Splits No The paper presents theoretical analysis with illustrative examples using small, custom-defined datasets, but it does not specify training, validation, or test dataset splits typically used in machine learning experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for any computations or simulations.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used.
Experiment Setup Yes For the motivating examples, the paper explicitly sets specific parameter values for the networks being analyzed: 'Choose v = 0.5ρ, u = 1, and b = 3.' and 'v1 = 1, u1 = 1, b1 = 2, v2 = 4, u2 = 1, b2 = 3.5, w = 1, c = 0.' These values define the specific configuration of the models used in their demonstrations.