Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks

Authors: Yuan Cao, Quanquan Gu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical analysis is backed up by numerical experiments. We perform numerical experiments to backup our theoretical analysis. We test Algorithm 1 together with the initialization method given in Theorem 4.7 for Re LU, sigmoid and hyperbolic tangent networks, and compare its performance with the Double Convotron algorithm proposed by Du and Goel [10]. Figures 1 gives the experimental results in semi-log plots.
Researcher Affiliation Academia Yuan Cao Department of Computer Science University of California, Los Angeles CA 90095, USA yuancao@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles CA 90095, USA qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 Approximate Gradient Descent for Non-overlapping CNN
Open Source Code No The paper does not provide concrete access to source code. There are no statements about releasing code or links to a code repository for the methodology described.
Open Datasets No The paper describes generating synthetic data: 'x1, . . . , xn Rd are generated independently from standard Gaussian distribution, and the corresponding output y1, . . . , yn R are generated from the teacher network with true parameters w and v'. It does not refer to a publicly available dataset with concrete access information.
Dataset Splits No The paper does not provide specific dataset split information for training, validation, and testing. It focuses on parameter recovery from generated data rather than splitting a fixed dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For all experiments, we set the number of iterations T = 100, sample size n = 1000. We tune the step size α to maximize performance. Specifically, we set α = 0.04 for Re LU, α = 0.25 for sigmoid, and α = 0.1 for hyperbolic tangent networks. We consider two settings: (i) k = 15, r = 5, eν = 0.08, (ii) k = 30, r = 9, eν = 0.04.