Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?
Authors: Zhiyuan Li, Yi Zhang, Sanjeev Arora
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: Comparison of generalization performance of convolutional versus fully-connected models trained by SGD. The grey dotted lines indicate separation, and we can see convolutional networks consistently outperform fully-connected networks. Here the input data are 3 32 32 RGB images and the binary label indicates for each image whether the first channel has larger ℓ2 norm than the second one. The input images are drawn from entry-wise independent Gaussian (left) and CIFAR-10 (right). |
| Researcher Affiliation | Academia | Zhiyuan Li, Yi Zhang Princeton University zhiyuanli,y.zhang@cs.princeton.edu Sanjeev Arora Princeton University & IAS arora@cs.princeton.edu |
| Pseudocode | Yes | Algorithm 1 Iterative algorithm A; Algorithm 2 Gradient Descent for FC-NN (FC networks) |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Figure 1: ...The input images are drawn from entry-wise independent Gaussian (left) and CIFAR-10 (right). |
| Dataset Splits | No | The paper mentions 'training data' and 'generalization performance' implying training and test sets, but does not explicitly describe a validation split or specific data partitioning for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'SGD' and 'batch-normalization Ioffe & Szegedy (2015)' but does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Figure 1: ...The 3-layer convolutional networks consist of two 3x3 convolutions with 10 hidden channels, and a 3x3 convolution with a single output channel followed by global average pooling. The 3-layer fully-connected networks consist of two fully-connected layers with 10000 hidden channels and another fully-connected layer with a single output. The 2-layer versions have one less intermediate layer and have only 3072 hidden channels for each layer. bn stands for batch-normalization Ioffe & Szegedy (2015). |