Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
Authors: Alon Brutzkus, Amir Globerson
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an empirical demonstration of this in Section 6 where gradient descent is shown to succeed in the Gaussian case and fail for a different distribution. Here we empirically demonstrate both the easy and hard cases. |
| Researcher Affiliation | Academia | 1Tel Aviv University, Blavatnik School of Computer Science. Correspondence to: Alon Brutzkus <alonbrutzkus@mail.tau.ac.il>, Amir Globerson <gamir@cs.tau.ac.il>. |
| Pseudocode | No | The paper describes mathematical derivations and algorithms using equations and narrative text, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | No | The paper describes the generation of synthetic training data for its empirical illustration ('To generate the hard case, we begin with a set splitting problem...', 'we use a Gaussian distribution G as deļ¬ned earlier and generate a training set...'), but it does not specify a publicly available dataset with concrete access information such as a link, DOI, or formal citation. |
| Dataset Splits | No | The paper's empirical section illustrates concepts but does not provide specific details on training, validation, or test dataset splits such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run its experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Ada Grad (Duchi et al., 2011)' for optimization, but it does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | No | The paper mentions using a 'random normal initializer' and choosing the 'best performing learning rate schedule' for Ada Grad, but it does not provide specific hyperparameter values or detailed training configurations for its experiments. |