Beyond Convexity: Stochastic Quasi-Convex Optimization
Authors: Elad Hazan, Kfir Levy, Shai Shalev-Shwartz
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report experimental results supporting our theoretical guarantees and demonstrate an accelerated convergence attained by SNGD. Additionally, under the 'Experiments' section: At first we were interested in comparing the performance of SNGD to MSGD (Minibatch Stochastic Gradient Descent), and to a stochastic variant of Nesterov s accelerated gradient method [19]... The comparison appears in Figures 2(a),2(b). |
| Researcher Affiliation | Academia | Elad Hazan Princeton University ehazan@cs.princeton.edu, Kfir Y. Levy Technion kfiryl@tx.technion.ac.il, Shai Shalev-Shwartz The Hebrew University shais@cs.huji.ac.il |
| Pseudocode | Yes | Algorithm 1 Normalized Gradient Descent (NGD) and Algorithm 2 Stochastic Normalized Gradient Descent (SNGD) |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | Yes | As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set. |
| Dataset Splits | No | The paper states, 'As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set.' However, it does not provide specific details on how the dataset was split into training, validation, or test sets. |
| Hardware Specification | No | The paper describes the experimental setup and results in Section 6 but does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper describes the use of a 'Re LU activation function' and 'square loss' but does not provide specific software names or version numbers (e.g., Python, TensorFlow, PyTorch, scikit-learn versions) used in the implementation. |
| Experiment Setup | Yes | As a test case, we train a Neural Network with a single hidden layer of 100 units over the MNIST data set. We use a Re LU activation function, and minimize the square loss. We employ a regularization over weights with a parameter of λ = 5 10 4. For MSGD and Nesterov s method we used a step size rule of the form ηt = η0(1 + γt) 3/4, with η0 = 0.01 and γ = 10 4. For SNGD we used the constant step size of 0.1. In Nesterov s method we used a momentum of 0.95. All methods employed a minibatch size of 100. |