Recovery Guarantees for One-hidden-layer Neural Networks

Authors: Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we use synthetic data to verify our theoretical results. We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)). We set W = UΣV , where U Rd k and V Rk k are orthogonal matrices generated from QR decomposition of Gaussian matrices, Σ is a diagonal matrix whose diagonal elements are 1, 1 + κ 1 k 1, 1 + 2(κ 1) k 1 , , κ. In this experiment, we set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling.
Researcher Affiliation Collaboration 1The University of Texas at Austin, zhongkai@ices.utexas.edu 2The University of Texas at Austin, zhaos@utexas.edu 3Microsoft Research, India, prajain@microsoft.com 4University of California, Berkeley, bartlett@cs.berkeley.edu 5The University of Texas at Austin, inderjit@cs.utexas.edu
Pseudocode Yes Algorithm 1 Initialization via Tensor Method; Algorithm 2 Globally Converging Algorithm
Open Source Code No The paper mentions using 'the code provided by (Kuleshov et al., 2015)' for non-orthogonal tensor methods, but does not provide concrete access to the source code for the methodology described in this paper.
Open Datasets No We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)).
Dataset Splits No The paper describes partitioning the total samples for algorithmic steps (initialization, gradient descent with resampling) but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for model evaluation.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'the code provided by (Kuleshov et al., 2015)' but does not list specific ancillary software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4).
Experiment Setup Yes We set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling. We fix d = 10, k = 5, n = 10000 and compare three different initialization approaches...