Towards Understanding the Spectral Bias of Deep Learning
Authors: Yuan Cao, Zhiying Fang, Yue Wu, Ding-Xuan Zhou, Quanquan Gu
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide numerical experiments to demonstrate the correctness of our theory. Our experimental results also show that our theory can tolerate certain model misspecification in terms of the input data distribution. We also conduct experiments to corroborate the theory we establish. In this section we present experimental results to verify our theory. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Los Angeles 2School of Data Science and Department of Mathematics, City University of Hong Kong |
| Pseudocode | Yes | Algorithm 1 GD for DNNs starting at Gaussian initialization |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper mentions generating synthetic data based on spherical harmonics and non-uniform distributions, but does not provide concrete access (e.g., a link or formal citation to a public repository) for these datasets. |
| Dataset Splits | No | The paper mentions a 'training sample size is 1000' but does not specify explicit train/validation/test splits, percentages, or sample counts for dataset partitioning. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions 'vanilla gradient descent' but does not list specific software dependencies with version numbers (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | Across all tasks, we train a two-layer neural networks with 4096 hidden neurons and initialize it exactly as defined in the problem setup. The optimization method is vanilla gradient descent, and the training sample size is 1000. Algorithm 1 with η r Opm 1θ 2q, θ r Opϵq satisfies |