SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
Authors: Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 1 we demonstrate this empirically for a linearly separable data set (from a subset of MNIST) learned using over-parameterized networks. |
| Researcher Affiliation | Academia | Alon Brutzkus & Amir Globerson The Blavatnik School of Computer Science Tel Aviv University, Israel alonbrutzkus@mail.tau.ac.il,amir.globerson@gmail.com Eran Malach & Shai Shalev-Shwartz School of Computer Science The Hebrew University, Israel eran.malach@mail.huji.ac.il,shais@cs.huji.ac.il |
| Pseudocode | No | The paper describes the SGD update rule mathematically (Eq. 3) but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code or a link to a code repository. |
| Open Datasets | No | The linearly separable data set consists of 4000 MNIST images with digits 3 and 5, each of dimension 784. |
| Dataset Splits | No | The size of the training set is 3000 and the remaining 1000 points form the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | The setting of Section 5 is implemented (e.g., SGD with batch of size 1, only first layer is trained, Leaky Re LU activations) and SGD is initialized according to the initialization defined in Eq. 6. |