reproducibilityindex.ai

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Authors: Yuanzhi Li, Yingyu Liang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	analysis provides interesting insights into several aspects of learning neural networks and can be veriﬁed based on empirical studies on synthetic data and on the MNIST dataset. and 7 Experiments This section aims at verifying some key implications: (1) the activation patterns of the hidden units couple with those at initialization; (2) The distance from the learned solution from the initialization is relatively small compared to the size of initialization; (3) The accumulated updates (i.e., the difference between the learned weight matrix and the initialization) have approximately low rank. These are indeed supported by the results on the synthetic data and on the MNIST data.
Researcher Affiliation	Academia	Yuanzhi Li Computer Science Department Stanford University Stanford, CA 94305 yuanzhil@stanford.edu and Yingyu Liang Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 yliang@cs.wisc.edu
Pseudocode	No	The paper describes the SGD update as a mathematical formula (1) but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not provide any statement or link indicating the release of open-source code for the methodology described.
Open Datasets	Yes	on the benchmark dataset MNIST
Dataset Splits	No	The paper mentions '1000 training data points and 1000 test data points are sampled' for synthetic data but does not provide explicit training/validation/test dataset splits with percentages or specific counts for all datasets used, nor does it specify a validation set.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models or memory specifications used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers, required to replicate the experiment.
Experiment Setup	Yes	On the synthetic data, the SGD is run for T = 400 steps with batch size B = 16 and learning rate η = 10/m. On MNIST, the SGD is run for T = 2 104 steps with batch size B = 64 and learning rate η = 4 103/m. and the weights are initialized with N(0, 1/ m).