Recovery Guarantees for One-hidden-layer Neural Networks
Authors: Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we use synthetic data to verify our theoretical results. We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)). We set W = UΣV , where U Rd k and V Rk k are orthogonal matrices generated from QR decomposition of Gaussian matrices, Σ is a diagonal matrix whose diagonal elements are 1, 1 + κ 1 k 1, 1 + 2(κ 1) k 1 , , κ. In this experiment, we set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling. |
| Researcher Affiliation | Collaboration | 1The University of Texas at Austin, zhongkai@ices.utexas.edu 2The University of Texas at Austin, zhaos@utexas.edu 3Microsoft Research, India, prajain@microsoft.com 4University of California, Berkeley, bartlett@cs.berkeley.edu 5The University of Texas at Austin, inderjit@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Initialization via Tensor Method; Algorithm 2 Globally Converging Algorithm |
| Open Source Code | No | The paper mentions using 'the code provided by (Kuleshov et al., 2015)' for non-orthogonal tensor methods, but does not provide concrete access to the source code for the methodology described in this paper. |
| Open Datasets | No | We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)). |
| Dataset Splits | No | The paper describes partitioning the total samples for algorithmic steps (initialization, gradient descent with resampling) but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for model evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'the code provided by (Kuleshov et al., 2015)' but does not list specific ancillary software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). |
| Experiment Setup | Yes | We set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling. We fix d = 10, k = 5, n = 10000 and compare three different initialization approaches... |