Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Recovery Guarantees for One-hidden-layer Neural Networks
Authors: Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we use synthetic data to verify our theoretical results. We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)). We set W = UΣV , where U Rd k and V Rk k are orthogonal matrices generated from QR decomposition of Gaussian matrices, Σ is a diagonal matrix whose diagonal elements are 1, 1 + κ 1 k 1, 1 + 2(κ 1) k 1 , , κ. In this experiment, we set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling. |
| Researcher Affiliation | Collaboration | 1The University of Texas at Austin, EMAIL 2The University of Texas at Austin, EMAIL 3Microsoft Research, India, EMAIL 4University of California, Berkeley, EMAIL 5The University of Texas at Austin, EMAIL |
| Pseudocode | Yes | Algorithm 1 Initialization via Tensor Method; Algorithm 2 Globally Converging Algorithm |
| Open Source Code | No | The paper mentions using 'the code provided by (Kuleshov et al., 2015)' for non-orthogonal tensor methods, but does not provide concrete access to the source code for the methodology described in this paper. |
| Open Datasets | No | We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)). |
| Dataset Splits | No | The paper describes partitioning the total samples for algorithmic steps (initialization, gradient descent with resampling) but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for model evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'the code provided by (Kuleshov et al., 2015)' but does not list specific ancillary software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). |
| Experiment Setup | Yes | We set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling. We fix d = 10, k = 5, n = 10000 and compare three different initialization approaches... |