Implicit Bias of Gradient Descent on Linear Convolutional Networks
Authors: Suriya Gunasekar, Jason D. Lee, Daniel Soudry, Nati Srebro
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show that gradient descent on full width linear convolutional networks of depth L converges to a linear predictor related to the ℓ2/L bridge penalty in the frequency domain. This is in contrast to fully connected linear networks, where regardless of depth, gradient descent converges to the ℓ2 maximum margin solution.Finally, in this paper we focus on characterizing which global minimum does gradient descent on over-parameterized linear models converge to, while assuming that for appropriate choice of step sizes gradient descent iterates asymptotically minimize the optimization objective. |
| Researcher Affiliation | Collaboration | Suriya Gunasekar TTI at Chicago, USA suriya@ttic.edu Jason D. Lee USC Los Angeles, USA jasonlee@marshall.usc.edu Daniel Soudry Technion, Israel daniel.soudry@gmail.com Nathan Srebro TTI at Chicago, USA nati@ttic.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code or provide a link to a code repository for the described methodology. |
| Open Datasets | No | The paper discusses 'separable linear classification dataset {(xn, yn) : n = 1, 2, . . . N}' as a general concept for its theoretical analysis, but does not mention or provide access information for a specific, publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments, therefore it does not provide specific details regarding training, validation, or test dataset splits. |
| Hardware Specification | No | The paper focuses on theoretical analysis and does not describe any experiments that would require hardware, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any implementation details that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experimental setup, including hyperparameters or system-level training settings. |