Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recovery Guarantees for One-hidden-layer Neural Networks

Authors: Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

ICML 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we use synthetic data to verify our theoretical results. We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)). We set W = UΣV , where U Rd k and V Rk k are orthogonal matrices generated from QR decomposition of Gaussian matrices, Σ is a diagonal matrix whose diagonal elements are 1, 1 + κ 1 k 1, 1 + 2(κ 1) k 1 , , κ. In this experiment, we set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling.
Researcher Affiliation Collaboration 1The University of Texas at Austin, EMAIL 2The University of Texas at Austin, EMAIL 3Microsoft Research, India, EMAIL 4University of California, Berkeley, EMAIL 5The University of Texas at Austin, EMAIL
Pseudocode Yes Algorithm 1 Initialization via Tensor Method; Algorithm 2 Globally Converging Algorithm
Open Source Code No The paper mentions using 'the code provided by (Kuleshov et al., 2015)' for non-orthogonal tensor methods, but does not provide concrete access to the source code for the methodology described in this paper.
Open Datasets No We generate data points {xi, yi}i=1,2, ,n from Distribution D(defined in Eq. (1)).
Dataset Splits No The paper describes partitioning the total samples for algorithmic steps (initialization, gradient descent with resampling) but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits for model evaluation.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'the code provided by (Kuleshov et al., 2015)' but does not list specific ancillary software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4).
Experiment Setup Yes We set κ = 2 and k = 5. We set v i to be randomly picked from { 1, 1} with equal chance. We use squared Re LU φ(z) = max{z, 0}2, which is a smooth homogeneous function. For non-orthogonal tensor methods, we directly use the code provided by (Kuleshov et al., 2015) with the number of random projections fixed as L = 100. We pick the stepsize η = 0.02 for gradient descent. In the experiments, we don t do the resampling since the algorithm still works well without resampling. We fix d = 10, k = 5, n = 10000 and compare three different initialization approaches...