On the Complexity of Learning Neural Networks

Authors: Le Song, Santosh Vempala, John Wilmes, Bo Xie

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our lower bounds are asymptotic, but we show empirically in Section 4 that they apply even at practical values of n and s. We experimentally observe a threshold for the quantity s n, above which stochastic gradient descent fails to train the NN to low error that is, regression error below that of the best constant approximation regardless of choices of gates, architecture used to learning, learning rate, batch size, etc.
Researcher Affiliation Academia Le Song Georgia Institute of Technology Atlanta, GA 30332 lsong@cc.gatech.edu Santosh Vempala Georgia Institute of Technology Atlanta, GA 30332 vempala@gatech.edu John Wilmes Georgia Institute of Technology Atlanta, GA 30332 wilmesj@gatech.edu Bo Xie Georgia Institute of Technology Atlanta, GA 30332 bo.xie@gatech.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets No For a given sharpness parameter s {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2}, input dimension d {50, 100, 200} and input distribution, we generate the true function according to Eqn. 1.1. There are a total of 50,000 training data points and 1000 test data points. We then learn the true function with fully-connected neural networks of both Re LU and sigmoid activation functions.
Dataset Splits No The paper mentions "50,000 training data points and 1000 test data points" but does not specify a separate validation split or the use of cross-validation for hyperparameter tuning.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The number of hidden layers we used is 1, 2, and 4. The number of hidden units per layer varies from 4n to 8n. The training is carried out using SGD with 0.9 momentum, and we enumerate learning rates from 0.1, 0.01 and 0.001 and batch sizes from 64, 128 and 256.