The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods
Authors: Louis THIRY, Michael Arbel, Eugene Belilovsky, Edouard Oyallon
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train shallow classifiers, i.e. linear classifier and 1-hidden layer CNN (1-layer) on top of our representation Φ on two major image classification datasets, CIFAR-10 and Image Net, which consist respectively of 50k small and 1.2M large color images divided respectively into 10 and 1k classes. For training, we systematically used mini-batch SGD with momentum of 0.9, no weight decay and using the cross-entropy loss. |
| Researcher Affiliation | Academia | Louis Thiry Département d Informatique de l ENS ENS, CNRS, PSL University Paris, France louis.thiry@ens.fr Michael Arbel Gatsby Computational Neuroscience Unit University College London London, United Kingdom michael.n.arbel@gmail.com Eugene Belilovsky Concordia University and Mila Montreal, Canada eugene.belilovsky@concordia.ca Edouard Oyallon CNRS, LIP6, Sorbonne University Paris, France edouard.oyallon@lip6.fr |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found. |
| Open Source Code | Yes | Our code as well as commands to reproduce our results are available here: https://github.com/louity/patches. |
| Open Datasets | Yes | We train shallow classifiers, i.e. linear classifier and 1-hidden layer CNN (1-layer) on top of our representation Φ on two major image classification datasets, CIFAR-10 and Image Net, which consist respectively of 50k small and 1.2M large color images divided respectively into 10 and 1k classes. |
| Dataset Splits | No | While the paper mentions training and testing on CIFAR-10 and ImageNet, it does not explicitly provide specific training/validation/test splits (e.g., percentages or exact counts for each split) or refer to a standard split that includes a validation set. It only states the total dataset sizes. |
| Hardware Specification | No | The paper mentions general hardware support like "GPU donation from NVIDIA" and "HPC resources of IDRIS" but does not specify exact GPU models (e.g., NVIDIA A100, Tesla V100) or detailed CPU/cluster specifications used for running experiments. |
| Software Dependencies | No | The paper describes the methods and techniques used (e.g., mini-batch SGD, cross-entropy loss, batch-normalization), but does not provide specific software package names with version numbers (e.g., "PyTorch 1.9", "Python 3.8"). |
| Experiment Setup | Yes | For training, we systematically used mini-batch SGD with momentum of 0.9, no weight decay and using the cross-entropy loss. The classifier is trained for 175 epoch with a learning rate decay of 0.1 at epochs 100 and 150. The initial learning rate is 0.003 for |D| = 2k and 0.001 for larger |D|. For the linear classification experiments, we used an average pooling of size k1 = 5 and stride s1 = 3, k2 = 1 and c2 = 128 for the first convolutional operator and k3 = 6 for the second one. We set the patch size to P = 6 and the whitening regularization to λ = 10 2. The parameters of the linear convolutional classifier are chosen to be: k1 = 10, s1 = 6, k2 = 1, c2 = 256, k3 = 7. |