Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

Authors: Alon Brutzkus, Amir Globerson

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide an empirical demonstration of this in Section 6 where gradient descent is shown to succeed in the Gaussian case and fail for a different distribution. Here we empirically demonstrate both the easy and hard cases.
Researcher Affiliation Academia 1Tel Aviv University, Blavatnik School of Computer Science. Correspondence to: Alon Brutzkus <alonbrutzkus@mail.tau.ac.il>, Amir Globerson <gamir@cs.tau.ac.il>.
Pseudocode No The paper describes mathematical derivations and algorithms using equations and narrative text, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets No The paper describes the generation of synthetic training data for its empirical illustration ('To generate the hard case, we begin with a set splitting problem...', 'we use a Gaussian distribution G as defined earlier and generate a training set...'), but it does not specify a publicly available dataset with concrete access information such as a link, DOI, or formal citation.
Dataset Splits No The paper's empirical section illustrates concepts but does not provide specific details on training, validation, or test dataset splits such as percentages or sample counts.
Hardware Specification No The paper does not provide any specific details regarding the hardware used to run its experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using 'Ada Grad (Duchi et al., 2011)' for optimization, but it does not provide specific version numbers for this or any other software dependencies.
Experiment Setup No The paper mentions using a 'random normal initializer' and choosing the 'best performing learning rate schedule' for Ada Grad, but it does not provide specific hyperparameter values or detailed training configurations for its experiments.