reproducibilityindex.ai

Recovery Guarantees for One-hidden-layer Neural Networks

Authors: Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we use synthetic data to verify our theoretical results. We generate data points {xi, yi}i=1,2, ,n from Distribution D(deﬁned in Eq. (1)). We set W = UΣV , where U Rd k and V Rk k are orthogonal matrices generated from QR decomposition of Gaussian matrices, Σ is a diagonal matrix whose diagonal elements are 1, 1 + κ 1 k 1, 1 + 2(κ 1) k 1 , , κ. In this experiment, we set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections ﬁxed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling.
Researcher Affiliation	Collaboration	1The University of Texas at Austin, zhongkai@ices.utexas.edu 2The University of Texas at Austin, zhaos@utexas.edu 3Microsoft Research, India, prajain@microsoft.com 4University of California, Berkeley, bartlett@cs.berkeley.edu 5The University of Texas at Austin, inderjit@cs.utexas.edu
Pseudocode	Yes	Algorithm 1 Initialization via Tensor Method; Algorithm 2 Globally Converging Algorithm
Open Source Code	No	The paper mentions using 'the code provided by (Kuleshov et al., 2015)' for non-orthogonal tensor methods, but does not provide concrete access to the source code for the methodology described in this paper.
Open Datasets	No	We generate data points {xi, yi}i=1,2, ,n from Distribution D(deﬁned in Eq. (1)).
Dataset Splits	No	The paper describes partitioning the total samples for algorithmic steps (initialization, gradient descent with resampling) but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for model evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using 'the code provided by (Kuleshov et al., 2015)' but does not list specific ancillary software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4).
Experiment Setup	Yes	We set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections ﬁxed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling. We ﬁx d = 10, k = 5, n = 10000 and compare three different initialization approaches...