Failures of Gradient-Based Deep Learning

Authors: Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We describe four types of simple problems, for which the gradientbased algorithms commonly used in deep learning either fail or suffer from significant difficulties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied.
Researcher Affiliation Academia 1School of Computer Science and Engineering, The Hebrew University 2Weizmann Institute of Science.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code for running all our experiments is available online1. 1 https://github.com/shakedshammah/failures_of_DL. See command lines in Appendix D.
Open Datasets No The paper describes generating synthetic datasets for its experiments (e.g., 'learning random parities' or a 'sampling procedure' for images), but does not provide access information (link, DOI, or citation) for a publicly available or open dataset.
Dataset Splits No The paper mentions training iterations and using a held-out test set, but does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper mentions software components like 'ReLU activations' or 'Le Net-like' architecture, but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes For our experiments, we use the hinge loss, and a simple network architecture of one fully connected layer of width 10d > 3d with ReLU activations, and a fully connected output layer with linear activation and a single unit. The final piece of the experimental setting is the choice of a loss function.