reproducibilityindex.ai

Efficient Learning of CNNs using Patch Based Features

Authors: Alon Brutzkus, Amir Globerson, Eran Malach, Alon Regev Netser, Shai Shalev-Schwartz

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify that the distributional assumptions hold on realworld data by experimenting on the CIFAR-10 dataset, and ﬁnd that the analyzed algorithm outperforms a vanilla one-hidden-layer CNN. Finally, we demonstrate that by running the algorithm in a layer-by-layer fashion we can build a deep model which gives further improvements, hinting that this method provides insights about the behavior of deep CNNs.
Researcher Affiliation	Academia	1Blavatnik School of Computer Science, Tel Aviv University, Israel 2School of Computer Science, The Hebrew University of Jerusalem, Israel.
Pseudocode	Yes	Algorithm 1 Clustering Input: Set of patches Pu, N > 0. Pick an arbitrary z Pu. Set D = {z}. for i = 2, ..., N do: Find v Pu which maximizes dist(v, D) (where dist(v, D) := min{dist(v, u) \| u D}) D D {v} return D.
Open Source Code	Yes	Our code is available here: github.com/Alon NT/patch-based-learning.
Open Datasets	Yes	We verify that the distributional assumptions hold on realworld data by experimenting on the CIFAR-10 dataset
Dataset Splits	No	The paper uses the CIFAR-10 dataset which has standard training and testing splits, and it mentions 'training dataset Su' and 'labeled training set S'. However, it does not explicitly state details about a separate validation split, such as percentages or sample counts for validation data.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions training with SGD and common deep learning practices like batch-normalization, implying the use of deep learning frameworks, but it does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We obtain the patches dictionary D = {v1, . . . , v N} by sampling M patches uniformly at random from the training data, performing whitening on the patches followed by kmeans clustering to get N centroids (see Figure 4). Unless noted otherwise, we use N = 1024. ... We choose k = 0.25 N as the number of neighbors deﬁning the embedding... We train for 200 epochs using SGD with momentum 0.9 and batch-size 64. We set 0.003 as the initial learning-rate, and decay it by a factor of 0.1 at epochs 100 and 150. We use standard data augmentations during training (random horizontal ﬂip and random crop), but no augmentations are used when sampling the patches for the dictionary.