Efficient Learning of CNNs using Patch Based Features
Authors: Alon Brutzkus, Amir Globerson, Eran Malach, Alon Regev Netser, Shai Shalev-Schwartz
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify that the distributional assumptions hold on realworld data by experimenting on the CIFAR-10 dataset, and find that the analyzed algorithm outperforms a vanilla one-hidden-layer CNN. Finally, we demonstrate that by running the algorithm in a layer-by-layer fashion we can build a deep model which gives further improvements, hinting that this method provides insights about the behavior of deep CNNs. |
| Researcher Affiliation | Academia | 1Blavatnik School of Computer Science, Tel Aviv University, Israel 2School of Computer Science, The Hebrew University of Jerusalem, Israel. |
| Pseudocode | Yes | Algorithm 1 Clustering Input: Set of patches Pu, N > 0. Pick an arbitrary z Pu. Set D = {z}. for i = 2, ..., N do: Find v Pu which maximizes dist(v, D) (where dist(v, D) := min{dist(v, u) | u D}) D D {v} return D. |
| Open Source Code | Yes | Our code is available here: github.com/Alon NT/patch-based-learning. |
| Open Datasets | Yes | We verify that the distributional assumptions hold on realworld data by experimenting on the CIFAR-10 dataset |
| Dataset Splits | No | The paper uses the CIFAR-10 dataset which has standard training and testing splits, and it mentions 'training dataset Su' and 'labeled training set S'. However, it does not explicitly state details about a separate validation split, such as percentages or sample counts for validation data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions training with SGD and common deep learning practices like batch-normalization, implying the use of deep learning frameworks, but it does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We obtain the patches dictionary D = {v1, . . . , v N} by sampling M patches uniformly at random from the training data, performing whitening on the patches followed by kmeans clustering to get N centroids (see Figure 4). Unless noted otherwise, we use N = 1024. ... We choose k = 0.25 N as the number of neighbors defining the embedding... We train for 200 epochs using SGD with momentum 0.9 and batch-size 64. We set 0.003 as the initial learning-rate, and decay it by a factor of 0.1 at epochs 100 and 150. We use standard data augmentations during training (random horizontal flip and random crop), but no augmentations are used when sampling the patches for the dictionary. |