reproducibilityindex.ai

Escaping Saddles with Stochastic Gradients

Authors: Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide experimental evidence suggesting the validity of this condition for training neural networks. In particular we show that, while the variance of uniform noise along eigenvectors corresponding to the most negative eigenvalue decreases as O(1/d), stochastic gradients have a signiﬁcant component along this direction independent of the width and depth of the neural net. When looking at the entire eigenspectrum, we ﬁnd that this variance increases with the magnitude of the associated eigenvalues. Hereby, we contribute to a better understanding of the success of training deep networks with SGD and its extensions.
Researcher Affiliation	Academia	Hadi Daneshmand * 1 Jonas Kohler * 1 Aurelien Lucchi 1 Thomas Hofmann 1 1ETH, Zurich, Switzerland. Correspondence to: Hadi Daneshmand <hadi.daneshmand@inf.eth.ch>.
Pseudocode	Yes	Algorithm 1 CNC-PGD; Algorithm 2 CNC-SGD
Open Source Code	No	The paper does not provide any specific links or explicit statements about the availability of its source code.
Open Datasets	Yes	All of these experiments are conducted using feed-forward networks on the well-known MNIST classiﬁcation task (n = 70 000).
Dataset Splits	No	The paper mentions using the MNIST dataset but does not explicitly specify the training, validation, or test splits (e.g., percentages or exact counts) or reference a standard split with a citation for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU specifications) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments.
Experiment Setup	Yes	The parameters we use for the half-space problem are as follows: learning rate η = 0.05 for SGD, η = 0.005 for GD, r = 0.1 for perturbed methods. For the neural network experiments, we use a constant learning rate of 0.01 for SGD.