Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Authors: Zihao Wang, Lei Wu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental validation. In this experiment, both short-range and long-range sparse target functions are considered. We set the input dimension d = 4096, the sample size n = 400 and the noise level σ to be zero. For the CNN architecture, the filter size is s = 4, resulting in a depth L = log4(d) = 6; the number of channels is set to C = 4 across all layers. The Adam optimizer is employed to our models , and importantly, no regularization is applied. As a comparison, we also examine two-layer fully-connected networks (FCNs) with a width 10, as well as the ordinary least linear regression (OLS). The results are shown in Figure 2. One can see clearly that even without any explicit sparsity regularization, CNN can still learn sparse interactions efficiently in both short-range and long-range scenarios. In contrast, FCN and OLS overfit the data and fail to generalize to test data.
Researcher Affiliation Academia Zihao Wang Peking University zihaowang@stu.pku.edu.cn Lei Wu Peking University leiwu@math.pku.edu.cn
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper uses synthetic data generated based on specified distributions and functions (e.g., P = N(0, I4d)), rather than a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions the use of 'The Adam optimizer' but does not specify any version numbers for it or any other software dependencies.
Experiment Setup Yes For the CNN architecture, the filter size is s = 4, resulting in a depth L = log4(d) = 6; the number of channels is set to C = 4 across all layers. The Adam optimizer is employed to our models , and importantly, no regularization is applied. The training is stopped when the training loss drops below 10 5.