Addressing Spectral Bias of Deep Neural Networks by Multi-Grade Deep Learning
Authors: RONGLONG FANG, Yuesheng Xu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply MGDL to synthetic, manifold, colored images, and MNIST datasets, all characterized by presence of high-frequency features. Our study reveals that MGDL excels at representing functions containing high-frequency information. Specifically, the neural networks learned in each grade adeptly capture some low-frequency information, allowing their compositions with SNNs learned in the previous grades effectively representing the high-frequency features. Our experimental results underscore the efficacy of MGDL in addressing the spectral bias inherent in DNNs. |
| Researcher Affiliation | Academia | Ronglong Fang, Yuesheng Xu Department of Mathematics and Statistics, Old Dominion University {rfang002, y1xu}@odu.edu |
| Pseudocode | No | The paper describes the model using mathematical equations and prose, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available on Git Hub: Addressing Spectral Bias via MGDL. |
| Open Datasets | Yes | We test the models with the cat image from website Cat, and the sea and building images from the Div2K dataset [1]. ... We choose the handwriting digits from MNIST dataset [21], composed of 60,000 training samples and 10,000 testing samples of the digits 0 through 9. |
| Dataset Splits | Yes | For all the four settings, the training data consist of pairs {xℓ, λ(xℓ)}ℓ N6000, where xℓ s are equally spaced between 0 and 1. The validation and testing data consist of pairs {xℓ, λ(xℓ)}ℓ N2000, where xℓ s are generated from a random uniform distribution on [0, 1], with the random seed set to be 0 and 1, respectively. |
| Hardware Specification | Yes | The experiments conducted in Sections 3.1, 3.2, and 3.4 of Section 3 were performed on X86_64 server equipped with an Intel(R) Xeon(R) Gold 6148 CPU @ 2.4GHz (40 slots) or Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz (32 slots). In contrast, the experiments described in Section 3.3 of Section 3 were performed on X86_64 server equipped with AMD 7543 @ 2.8GHz (64 slots) and AVX512, 2 x Nvidia Ampere A100 GPU. |
| Software Dependencies | No | We choose Re LU as the activation function as in [30] for all the four experiments. ... The optimization problems for both SGDL and MGDL across the four experiments are solved by the Adam method [20] with Xavier initialization [13]. (No specific version numbers for software libraries or environments are provided) |
| Experiment Setup | Yes | The loss functions defined in (2) for SGDL and (7) for MGDL are used to compute the training and validation loss when D is chosen to be the training and validation data, respectively. ... The optimization problems for both SGDL and MGDL across the four experiments are solved by the Adam method [20] with Xavier initialization [13]. ... The learning rate tk for the k-th epoch decays exponentially...with K being the total number of training epochs, tmax and tmin denoting the predefined maximum and minimum learning rates, respectively. ... The batch size is chosen from 256, 512 or the full gradient (denoted by Full ) for each epoch. The total epoch number K is set to be 30,000. |