reproducibilityindex.ai

Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Authors: Zihao Wang, Lei Wu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental validation. In this experiment, both short-range and long-range sparse target functions are considered. We set the input dimension d = 4096, the sample size n = 400 and the noise level σ to be zero. For the CNN architecture, the filter size is s = 4, resulting in a depth L = log4(d) = 6; the number of channels is set to C = 4 across all layers. The Adam optimizer is employed to our models , and importantly, no regularization is applied. As a comparison, we also examine two-layer fully-connected networks (FCNs) with a width 10, as well as the ordinary least linear regression (OLS). The results are shown in Figure 2. One can see clearly that even without any explicit sparsity regularization, CNN can still learn sparse interactions efficiently in both short-range and long-range scenarios. In contrast, FCN and OLS overfit the data and fail to generalize to test data.
Researcher Affiliation	Academia	Zihao Wang Peking University zihaowang@stu.pku.edu.cn Lei Wu Peking University leiwu@math.pku.edu.cn
Pseudocode	No	The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper uses synthetic data generated based on specified distributions and functions (e.g., P = N(0, I4d)), rather than a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions the use of 'The Adam optimizer' but does not specify any version numbers for it or any other software dependencies.
Experiment Setup	Yes	For the CNN architecture, the filter size is s = 4, resulting in a depth L = log4(d) = 6; the number of channels is set to C = 4 across all layers. The Adam optimizer is employed to our models , and importantly, no regularization is applied. The training is stopped when the training loss drops below 10 5.