First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Authors: Thanh Huy Nguyen, Umut Simsekli, Mert Gurbuzbalaban, Gaël RICHARD

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate our results with simulations on a synthetic model and neural networks. ...To illustrate our results, we first conduct experiments on a synthetic problem... Neural networks. In our second set of experiments, we consider the real data setting used in [6]: a multi-layer fully connected neural network with Re Lu activations on the MNIST dataset.
Researcher Affiliation Academia 1: LTCI, Télécom Paris, Institut Polytechnique de Paris, France 2: Department of Statistics, University of Oxford, UK 3: Dept. of Management Science and Information Systems, Rutgers Business School, NJ, USA
Pseudocode No The paper describes the discrete dynamics in equation (7) as 'W k+1 = W k η f(W k) + εση1/2ξk + εη1/αζk,' which represents the Euler discretization of the SDE. However, it is presented as a mathematical equation within the text, not as a formally labeled pseudocode or algorithm block.
Open Source Code Yes We adapted the code provided in [6] and we provide our version in https:// github.com/umutsimsekli/ sgd_first_exit_time.
Open Datasets Yes In our second set of experiments, we consider the real data setting used in [6]: a multi-layer fully connected neural network with Re Lu activations on the MNIST dataset.
Dataset Splits No The paper uses the MNIST dataset and describes aspects of the experimental setup, such as initializing networks and training until a certain accuracy, but it does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages or sample counts). It mentions “trained the networks with SGD until a vicinity of a local minimum is reached with at least 90% accuracy” but no explicit split information.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud instance types.
Software Dependencies No The paper states that they “adapted the code provided in [6]” and provide their version, but it does not specify any software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch, or specific library versions).
Experiment Setup Yes In our second set of experiments, we consider the real data setting used in [6]: a multi-layer fully connected neural network with Re Lu activations on the MNIST dataset. ...we monitored the first exit time by varying the η, the number of layers (depth), and the number of neurons per layer (width). ...we set the mini-batch size b = 10 and we did not add explicit Gaussian or Lévy noise.