reproducibilityindex.ai

Failures of Gradient-Based Deep Learning

Authors: Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We describe four types of simple problems, for which the gradientbased algorithms commonly used in deep learning either fail or suffer from signiﬁcant difﬁculties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, The Hebrew University 2Weizmann Institute of Science.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for running all our experiments is available online1. 1 https://github.com/shakedshammah/failures_of_DL. See command lines in Appendix D.
Open Datasets	No	The paper describes generating synthetic datasets for its experiments (e.g., 'learning random parities' or a 'sampling procedure' for images), but does not provide access information (link, DOI, or citation) for a publicly available or open dataset.
Dataset Splits	No	The paper mentions training iterations and using a held-out test set, but does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'ReLU activations' or 'Le Net-like' architecture, but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	For our experiments, we use the hinge loss, and a simple network architecture of one fully connected layer of width 10d > 3d with ReLU activations, and a fully connected output layer with linear activation and a single unit. The final piece of the experimental setting is the choice of a loss function.