Failures of Gradient-Based Deep Learning
Authors: Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We describe four types of simple problems, for which the gradientbased algorithms commonly used in deep learning either fail or suffer from significant difficulties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, The Hebrew University 2Weizmann Institute of Science. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for running all our experiments is available online1. 1 https://github.com/shakedshammah/failures_of_DL. See command lines in Appendix D. |
| Open Datasets | No | The paper describes generating synthetic datasets for its experiments (e.g., 'learning random parities' or a 'sampling procedure' for images), but does not provide access information (link, DOI, or citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions training iterations and using a held-out test set, but does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'ReLU activations' or 'Le Net-like' architecture, but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | For our experiments, we use the hinge loss, and a simple network architecture of one fully connected layer of width 10d > 3d with ReLU activations, and a fully connected output layer with linear activation and a single unit. The final piece of the experimental setting is the choice of a loss function. |