reproducibilityindex.ai

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Authors: Sanjeev Arora, Nadav Cohen, Elad Hazan

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with ℓp loss, p > 2, gradient descent can beneﬁt from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. [...] In this section we put these claims to the test, through a series of empirical evaluations based on Tensor Flow toolbox (Abadi et al. (2016)). For conciseness, many of the details behind our implementation are deferred to Appendix C. [...] Figure 2 shows convergence (training objective per iteration) of gradient descent optimizing depth-2 and depth-3 linear networks, against optimization of a single layer model using the respective preconditioning schemes (Equation 12 with N = 2, 3).
Researcher Affiliation	Collaboration	1Department of Computer Science, Princeton University, Princeton, NJ, USA 2School of Mathematics, Institute for Advanced Study, Princeton, NJ, USA 3Google Brain, USA.
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions using 'Tensor Flow toolbox' (a third-party tool) and defers 'implementation details' to an appendix, but no explicit statement or link for the authors' own code release.
Open Datasets	Yes	The dataset chosen was UCI Machine Learning Repository’s Gas Sensor Array Drift at Different Concentrations (Vergara et al., 2012; Rodriguez-Lujan et al., 2014). Speciﬁcally, we used the dataset’s Ethanol problem – a scalar regression task with 2565 examples, each comprising 128 features (one of the largest numeric regression tasks in the repository).
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset split information. It mentions using a dataset for 'training objective' and 'test' set implicitly but lacks details on the splits, percentages, or sample counts for each partition, nor does it explicitly mention a 'validation' set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. It only mentions the use of 'Tensor Flow toolbox'.
Software Dependencies	No	The paper mentions 'Tensor Flow toolbox' and 'Sci Py' but does not provide specific version numbers for these or any other ancillary software components, which is required for reproducibility.
Experiment Setup	Yes	In all experiments, initial weights were drawn from a zero-mean normal distribution with standard deviation 0.01. Learning rates were found through grid search, with grid {1e-2, 1e-3, 1e-4, 1e-5}. Unless otherwise indicated, weight decay coefﬁcient was set to zero. [...] For the experiments of Figure 4-right, where Adam optimizer was used, we relied on Tensor Flow’s default settings for learning rate (0.001) and β parameters (β1=0.9, β2=0.999). [...] As for the experiment of Figure 5-right, for the MNIST convolutional network tutorial, we used Tensor Flow’s default hyperparameter settings, namely: learning rate 0.01 (constant), dropout rate 0.5, RMSProp optimizer with decay 0.9 and momentum 0.9. Initial weights were drawn from truncated normal distribution with standard deviation 0.1.