reproducibilityindex.ai

Beyond Convexity: Stochastic Quasi-Convex Optimization

Authors: Elad Hazan, Kfir Levy, Shai Shalev-Shwartz

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report experimental results supporting our theoretical guarantees and demonstrate an accelerated convergence attained by SNGD. Additionally, under the 'Experiments' section: At ﬁrst we were interested in comparing the performance of SNGD to MSGD (Minibatch Stochastic Gradient Descent), and to a stochastic variant of Nesterov s accelerated gradient method [19]... The comparison appears in Figures 2(a),2(b).
Researcher Affiliation	Academia	Elad Hazan Princeton University ehazan@cs.princeton.edu, Kﬁr Y. Levy Technion kfiryl@tx.technion.ac.il, Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il
Pseudocode	Yes	Algorithm 1 Normalized Gradient Descent (NGD) and Algorithm 2 Stochastic Normalized Gradient Descent (SNGD)
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	Yes	As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set.
Dataset Splits	No	The paper states, 'As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set.' However, it does not provide specific details on how the dataset was split into training, validation, or test sets.
Hardware Specification	No	The paper describes the experimental setup and results in Section 6 but does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper describes the use of a 'Re LU activation function' and 'square loss' but does not provide specific software names or version numbers (e.g., Python, TensorFlow, PyTorch, scikit-learn versions) used in the implementation.
Experiment Setup	Yes	As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set. We use a Re LU activation function, and minimize the square loss. We employ a regularization over weights with a parameter of λ = 5 10 4. For MSGD and Nesterov s method we used a step size rule of the form ηt = η0(1 + γt) 3/4, with η0 = 0.01 and γ = 10 4. For SNGD we used the constant step size of 0.1. In Nesterov s method we used a momentum of 0.95. All methods employed a minibatch size of 100.