The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

Authors: Louis THIRY, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train shallow classifiers, i.e. linear classifier and 1-hidden layer CNN (1-layer) on top of our representation Φ on two major image classification datasets, CIFAR-10 and Image Net, which consist respectively of 50k small and 1.2M large color images divided respectively into 10 and 1k classes. For training, we systematically used mini-batch SGD with momentum of 0.9, no weight decay and using the cross-entropy loss.
Researcher Affiliation Academia Louis Thiry Département d Informatique de l ENS ENS, CNRS, PSL University Paris, France louis.thiry@ens.fr Michael Arbel Gatsby Computational Neuroscience Unit University College London London, United Kingdom michael.n.arbel@gmail.com Eugene Belilovsky Concordia University and Mila Montreal, Canada eugene.belilovsky@concordia.ca Edouard Oyallon CNRS, LIP6, Sorbonne University Paris, France edouard.oyallon@lip6.fr
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found.
Open Source Code Yes Our code as well as commands to reproduce our results are available here: https://github.com/louity/patches.
Open Datasets Yes We train shallow classifiers, i.e. linear classifier and 1-hidden layer CNN (1-layer) on top of our representation Φ on two major image classification datasets, CIFAR-10 and Image Net, which consist respectively of 50k small and 1.2M large color images divided respectively into 10 and 1k classes.
Dataset Splits No While the paper mentions training and testing on CIFAR-10 and ImageNet, it does not explicitly provide specific training/validation/test splits (e.g., percentages or exact counts for each split) or refer to a standard split that includes a validation set. It only states the total dataset sizes.
Hardware Specification No The paper mentions general hardware support like "GPU donation from NVIDIA" and "HPC resources of IDRIS" but does not specify exact GPU models (e.g., NVIDIA A100, Tesla V100) or detailed CPU/cluster specifications used for running experiments.
Software Dependencies No The paper describes the methods and techniques used (e.g., mini-batch SGD, cross-entropy loss, batch-normalization), but does not provide specific software package names with version numbers (e.g., "PyTorch 1.9", "Python 3.8").
Experiment Setup Yes For training, we systematically used mini-batch SGD with momentum of 0.9, no weight decay and using the cross-entropy loss. The classifier is trained for 175 epoch with a learning rate decay of 0.1 at epochs 100 and 150. The initial learning rate is 0.003 for |D| = 2k and 0.001 for larger |D|. For the linear classification experiments, we used an average pooling of size k1 = 5 and stride s1 = 3, k2 = 1 and c2 = 128 for the first convolutional operator and k3 = 6 for the second one. We set the patch size to P = 6 and the whitening regularization to λ = 10 2. The parameters of the linear convolutional classifier are chosen to be: k1 = 10, s1 = 6, k2 = 1, c2 = 256, k3 = 7.