reproducibilityindex.ai

On Graduated Optimization for Stochastic Non-Convex Problems

Authors: Elad Hazan, Kfir Yehuda Levy, Shai Shalev-Shwartz

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments support the theoretical guarantees, substantiating an accelerated convergence in training the NN. Moreover, we demonstrate a non-convex phenomena that exists in natural data, and is captured by the σ-nice property. Section 7. Experiments. As a test case, we train a NN with a single hidden layer of 30 units over the MNIST data set.
Researcher Affiliation	Academia	Elad Hazan EHAZAN@CS.PRINCETON.EDU Princeton University; Kfir Y. Levy KFIRYL@TX.TECHNION.AC.IL Technion Israel Institute of Technology; Shai Shalev-Shwartz SHAIS@CS.HUJI.AC.IL The Hebrew University of Jerusalem, Israel
Pseudocode	Yes	Figure 1. Smoothed gradient oracle given gradient feedback. Figure 2. Smoothed gradient oracle given value feedback. Algorithm 1 Grad Opt G. Algorithm 2 Sufﬁx-SGD. Algorithm 3 Grad Opt V.
Open Source Code	No	The paper does not provide any explicit statement about making the source code available or include links to a code repository.
Open Datasets	Yes	As a test case, we train a NN with a single hidden layer of 30 units over the MNIST data set. We adopt the experimental setup of (Dauphin et al., 2014) and train over a down-scaled version of the data, i.e., the original 28 28 images of MNIST were down-sampled to the size of 10 10.
Dataset Splits	No	The paper mentions using the MNIST dataset for training and evaluation but does not explicitly provide specific training/validation/test dataset splits, percentages, or absolute sample counts for each split.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud computing instance types used for running experiments. It only mentions general training parameters like 'using a batch size of 100'.
Software Dependencies	No	The paper mentions using a 'ReLU activation function' and minimizing 'square loss', but it does not specify any software dependencies with version numbers (e.g., specific programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, etc., with their versions).
Experiment Setup	Yes	We train a NN with a single hidden layer of 30 units over the MNIST data set. We adopt the experimental setup of (Dauphin et al., 2014) and train over a down-scaled version of the data, i.e., the original 28 28 images of MNIST were down-sampled to the size of 10 10. We use a Re LU activation function, and minimize the square loss. We started by running MSGD (Minibatch Stochastic Gradient Descent) on the problem, using a batch size of 100, and a step size rule of ηt = η0(1 + γt) 3/4, where η0 = 0.01, γ = 10 4. This choice of step size rule was the most effective among a grid of rules that we examined.