Deep Frequency Principle Towards Understanding Why Deeper Learning Is Faster
Authors: Zhiqin John Xu, Hanxu Zhou10541-10550
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we utilize the Fourier analysis to empirically provide a promising mechanism to understand why feedforward deeper learning is faster. To this end, we separate a deep neural network, trained by normal stochastic gradient descent, into two parts during analysis, i.e., a pre-condition component and a learning component, in which the output of the pre-condition one is the input of the learning one. We use a filtering method to characterize the frequency distribution of a high-dimensional function. Based on experiments of deep networks and real dataset, we propose a deep frequency principle, that is, the effective target function for a deeper hidden layer biases towards lower frequency during the training. |
| Researcher Affiliation | Academia | Zhiqin John Xu and Hanxu Zhou School of Mathematical Sciences, MOE-LSC and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China xuzhiqin@sjtu.edu.cn, 1210123652@sjtu.edu.cn |
| Pseudocode | No | The paper defines the neural network recursively with equations but does not include any labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All codes are written by Python and Tensorflow, and run on Linux system with Nvidia GTX 2080Ti or Tesla V100 cards. Codes can be found at github.com. |
| Open Datasets | Yes | For the experiments of the variants of Resnet18 on CIFAR10, the network structures are shown in Fig. 4. [...] We use 40000 samples of CIFAR10 as the training set and 10000 examples as the validation set. [...] For the experiment of fully-connected network on MNIST, [...] We use 30000 samples of the MNIST as training set. |
| Dataset Splits | Yes | We use 40000 samples of CIFAR10 as the training set and 10000 examples as the validation set. |
| Hardware Specification | Yes | All codes are written by Python and Tensorflow, and run on Linux system with Nvidia GTX 2080Ti or Tesla V100 cards. |
| Software Dependencies | No | The paper mentions "Python and Tensorflow" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For the experiments of the variants of Resnet18 on CIFAR10, the network structures are shown in Fig. 4. The output layer is equipped with softmax and the network is trained by Adam optimizer with cross-entropy loss and batch size 256. The learning rate is changed as the training proceeds, that is, 10 3 for epoch 1-40 , 10 4 for epoch 41-60, and 10 5 for epoch 61-80. [...] For the experiment of fully-connected network on MNIST, we choose the activation function of tanh and size 784 500 500 500 500 500 10. The output layer of the network does not equip any activation function. The network is trained by Adam optimizer with mean squared loss, batch size 256 and learning rate 10 5. The training is stopped when the loss is smaller than 10 2. |