Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Authors: Zhiyuan Li, Yi Zhang, Sanjeev Arora

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: Comparison of generalization performance of convolutional versus fully-connected models trained by SGD. The grey dotted lines indicate separation, and we can see convolutional networks consistently outperform fully-connected networks. Here the input data are 3 32 32 RGB images and the binary label indicates for each image whether the first channel has larger ℓ2 norm than the second one. The input images are drawn from entry-wise independent Gaussian (left) and CIFAR-10 (right).
Researcher Affiliation Academia Zhiyuan Li, Yi Zhang Princeton University zhiyuanli,y.zhang@cs.princeton.edu Sanjeev Arora Princeton University & IAS arora@cs.princeton.edu
Pseudocode Yes Algorithm 1 Iterative algorithm A; Algorithm 2 Gradient Descent for FC-NN (FC networks)
Open Source Code No The paper does not contain any explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes Figure 1: ...The input images are drawn from entry-wise independent Gaussian (left) and CIFAR-10 (right).
Dataset Splits No The paper mentions 'training data' and 'generalization performance' implying training and test sets, but does not explicitly describe a validation split or specific data partitioning for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'SGD' and 'batch-normalization Ioffe & Szegedy (2015)' but does not list specific software dependencies with version numbers.
Experiment Setup Yes Figure 1: ...The 3-layer convolutional networks consist of two 3x3 convolutions with 10 hidden channels, and a 3x3 convolution with a single output channel followed by global average pooling. The 3-layer fully-connected networks consist of two fully-connected layers with 10000 hidden channels and another fully-connected layer with a single output. The 2-layer versions have one less intermediate layer and have only 3072 hidden channels for each layer. bn stands for batch-normalization Ioffe & Szegedy (2015).