On the Complexity of Learning Neural Networks
Authors: Le Song, Santosh Vempala, John Wilmes, Bo Xie
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our lower bounds are asymptotic, but we show empirically in Section 4 that they apply even at practical values of n and s. We experimentally observe a threshold for the quantity s n, above which stochastic gradient descent fails to train the NN to low error that is, regression error below that of the best constant approximation regardless of choices of gates, architecture used to learning, learning rate, batch size, etc. |
| Researcher Affiliation | Academia | Le Song Georgia Institute of Technology Atlanta, GA 30332 lsong@cc.gatech.edu Santosh Vempala Georgia Institute of Technology Atlanta, GA 30332 vempala@gatech.edu John Wilmes Georgia Institute of Technology Atlanta, GA 30332 wilmesj@gatech.edu Bo Xie Georgia Institute of Technology Atlanta, GA 30332 bo.xie@gatech.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | No | For a given sharpness parameter s {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2}, input dimension d {50, 100, 200} and input distribution, we generate the true function according to Eqn. 1.1. There are a total of 50,000 training data points and 1000 test data points. We then learn the true function with fully-connected neural networks of both Re LU and sigmoid activation functions. |
| Dataset Splits | No | The paper mentions "50,000 training data points and 1000 test data points" but does not specify a separate validation split or the use of cross-validation for hyperparameter tuning. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The number of hidden layers we used is 1, 2, and 4. The number of hidden units per layer varies from 4n to 8n. The training is carried out using SGD with 0.9 momentum, and we enumerate learning rates from 0.1, 0.01 and 0.001 and batch sizes from 64, 128 and 256. |