Efficient Learning of CNNs using Patch Based Features

Authors: Alon Brutzkus, Amir Globerson, Eran Malach, Alon Regev Netser, Shai Shalev-Schwartz

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify that the distributional assumptions hold on realworld data by experimenting on the CIFAR-10 dataset, and find that the analyzed algorithm outperforms a vanilla one-hidden-layer CNN. Finally, we demonstrate that by running the algorithm in a layer-by-layer fashion we can build a deep model which gives further improvements, hinting that this method provides insights about the behavior of deep CNNs.
Researcher Affiliation Academia 1Blavatnik School of Computer Science, Tel Aviv University, Israel 2School of Computer Science, The Hebrew University of Jerusalem, Israel.
Pseudocode Yes Algorithm 1 Clustering Input: Set of patches Pu, N > 0. Pick an arbitrary z Pu. Set D = {z}. for i = 2, ..., N do: Find v Pu which maximizes dist(v, D) (where dist(v, D) := min{dist(v, u) | u D}) D D {v} return D.
Open Source Code Yes Our code is available here: github.com/Alon NT/patch-based-learning.
Open Datasets Yes We verify that the distributional assumptions hold on realworld data by experimenting on the CIFAR-10 dataset
Dataset Splits No The paper uses the CIFAR-10 dataset which has standard training and testing splits, and it mentions 'training dataset Su' and 'labeled training set S'. However, it does not explicitly state details about a separate validation split, such as percentages or sample counts for validation data.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions training with SGD and common deep learning practices like batch-normalization, implying the use of deep learning frameworks, but it does not specify any software dependencies with version numbers.
Experiment Setup Yes We obtain the patches dictionary D = {v1, . . . , v N} by sampling M patches uniformly at random from the training data, performing whitening on the patches followed by kmeans clustering to get N centroids (see Figure 4). Unless noted otherwise, we use N = 1024. ... We choose k = 0.25 N as the number of neighbors defining the embedding... We train for 200 epochs using SGD with momentum 0.9 and batch-size 64. We set 0.003 as the initial learning-rate, and decay it by a factor of 0.1 at epochs 100 and 150. We use standard data augmentations during training (random horizontal flip and random crop), but no augmentations are used when sampling the patches for the dictionary.