reproducibilityindex.ai

Deep Frequency Principle Towards Understanding Why Deeper Learning Is Faster

Authors: Zhiqin John Xu, Hanxu Zhou10541-10550

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we utilize the Fourier analysis to empirically provide a promising mechanism to understand why feedforward deeper learning is faster. To this end, we separate a deep neural network, trained by normal stochastic gradient descent, into two parts during analysis, i.e., a pre-condition component and a learning component, in which the output of the pre-condition one is the input of the learning one. We use a ﬁltering method to characterize the frequency distribution of a high-dimensional function. Based on experiments of deep networks and real dataset, we propose a deep frequency principle, that is, the effective target function for a deeper hidden layer biases towards lower frequency during the training.
Researcher Affiliation	Academia	Zhiqin John Xu and Hanxu Zhou School of Mathematical Sciences, MOE-LSC and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China xuzhiqin@sjtu.edu.cn, 1210123652@sjtu.edu.cn
Pseudocode	No	The paper defines the neural network recursively with equations but does not include any labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All codes are written by Python and Tensorﬂow, and run on Linux system with Nvidia GTX 2080Ti or Tesla V100 cards. Codes can be found at github.com.
Open Datasets	Yes	For the experiments of the variants of Resnet18 on CIFAR10, the network structures are shown in Fig. 4. [...] We use 40000 samples of CIFAR10 as the training set and 10000 examples as the validation set. [...] For the experiment of fully-connected network on MNIST, [...] We use 30000 samples of the MNIST as training set.
Dataset Splits	Yes	We use 40000 samples of CIFAR10 as the training set and 10000 examples as the validation set.
Hardware Specification	Yes	All codes are written by Python and Tensorﬂow, and run on Linux system with Nvidia GTX 2080Ti or Tesla V100 cards.
Software Dependencies	No	The paper mentions "Python and Tensorﬂow" but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For the experiments of the variants of Resnet18 on CIFAR10, the network structures are shown in Fig. 4. The output layer is equipped with softmax and the network is trained by Adam optimizer with cross-entropy loss and batch size 256. The learning rate is changed as the training proceeds, that is, 10 3 for epoch 1-40 , 10 4 for epoch 41-60, and 10 5 for epoch 61-80. [...] For the experiment of fully-connected network on MNIST, we choose the activation function of tanh and size 784 500 500 500 500 500 10. The output layer of the network does not equip any activation function. The network is trained by Adam optimizer with mean squared loss, batch size 256 and learning rate 10 5. The training is stopped when the loss is smaller than 10 2.