Beyond Convexity: Stochastic Quasi-Convex Optimization

Authors: Elad Hazan, Kfir Levy, Shai Shalev-Shwartz

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report experimental results supporting our theoretical guarantees and demonstrate an accelerated convergence attained by SNGD. Additionally, under the 'Experiments' section: At first we were interested in comparing the performance of SNGD to MSGD (Minibatch Stochastic Gradient Descent), and to a stochastic variant of Nesterov s accelerated gradient method [19]... The comparison appears in Figures 2(a),2(b).
Researcher Affiliation Academia Elad Hazan Princeton University ehazan@cs.princeton.edu, Kfir Y. Levy Technion kfiryl@tx.technion.ac.il, Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il
Pseudocode Yes Algorithm 1 Normalized Gradient Descent (NGD) and Algorithm 2 Stochastic Normalized Gradient Descent (SNGD)
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets Yes As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set.
Dataset Splits No The paper states, 'As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set.' However, it does not provide specific details on how the dataset was split into training, validation, or test sets.
Hardware Specification No The paper describes the experimental setup and results in Section 6 but does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper describes the use of a 'Re LU activation function' and 'square loss' but does not provide specific software names or version numbers (e.g., Python, TensorFlow, PyTorch, scikit-learn versions) used in the implementation.
Experiment Setup Yes As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set. We use a Re LU activation function, and minimize the square loss. We employ a regularization over weights with a parameter of λ = 5 10 4. For MSGD and Nesterov s method we used a step size rule of the form ηt = η0(1 + γt) 3/4, with η0 = 0.01 and γ = 10 4. For SNGD we used the constant step size of 0.1. In Nesterov s method we used a momentum of 0.95. All methods employed a minibatch size of 100.