Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Are ResNets Provably Better than Linear Predictors?
Authors: Ohad Shamir
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we rigorously prove that arbitrarily deep, nonlinear residual units indeed exhibit this behavior, in the sense that the optimization landscape contains no local minima with value above what can be obtained with a linear predictor (namely a 1-layer network). Notably, we show this under minimal or no assumptions on the precise network architecture, data distribution, or loss function used. We also provide a quantitative analysis of approximate stationary points for this problem. Finally, we show that with a certain tweak to the architecture, training the network with standard stochastic gradient descent achieves an objective value close or better than any linear predictor. |
| Researcher Affiliation | Academia | Ohad Shamir Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel EMAIL |
| Pseudocode | No | The paper describes the stochastic gradient descent (SGD) algorithm in text, but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper is theoretical and does not mention releasing any source code for its methodology or findings. |
| Open Datasets | No | The paper mentions training with respect to "some data distribution (e.g. an average over some training set {xi, yi})" as part of its theoretical setup, but it does not refer to or provide access information for any specific publicly available dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments that would involve validation data splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies or versions. |
| Experiment Setup | No | The paper is theoretical and does not describe specific experimental setups or hyperparameters. |