reproducibilityindex.ai

Intraclass clustering: an implicit learning ability that regularizes DNNs

Authors: Simon Carbonnelle, Christophe De Vleeschouwer

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we hypothesize that deep neural networks are regularized through their ability to extract meaningful clusters among the samples of a class. This constitutes an implicit form of regularization, as no explicit training mechanisms or supervision target such behaviour. To support our hypothesis, we design four different measures of intraclass clustering, based on the neuronand layer-level representations of the training data. We then show that these measures constitute accurate predictors of generalization performance across variations of a large set of hyperparameters (learning rate, batch size, optimizer, weight decay, dropout rate, data augmentation, network depth and width).
Researcher Affiliation	Academia	Simon Carbonnelle, Christophe De Vleeschouwer FNRS research fellows ICTEAM, Universit catholique de Louvain Louvain-La-Neuve, Belgium simon.carbonnelle@gmail.com, christophe.devleeschouwer@uclouvain.be
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets	Yes	The datasets are CIFAR10, CIFAR100 and the coarse version of CIFAR100 with 20 superclasses (Krizhevsky & Hinton, 2009).
Dataset Splits	No	The paper mentions training and testing on CIFAR10/CIFAR100, which have standard splits, but it does not explicitly state the train/validation/test dataset splits (percentages, counts, or explicit reference to predefined splits) used in their experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud computing specifications) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	In order to build a set of models with a wide range of generalization performances, we vary hyperparameters that are known to be critical. Since varying multiple hyperparameters improves the identiﬁcation of causal relationships, we vary 8 different hyperparameters: learning rate, batch size, optimizer (SGD or Adam (Kingma & Ba, 2015)), weight decay, dropout rate (Srivastava et al., 2014), data augmentation, network depth and width.The resulting hyperparameter values are as follows: 1. (Learning rate, Weight decay): {(0.01, 0.), (0.32, 0.), (0.1, 0.), (0.1, 4 10 5)} 2. Batch size: {100, 300} 3. Optimizer: {SGD, Adam} 4. (Dropout rate, Data augm.): {(0., true), (0., false), (0.2, false), (0.4, false)} 5. (Width factor, Depth factor): {( 1., 1.), ( 1.5, 1.), ( 1., 1.5))}We train models for 250 epochs, and reduce the learning rate by a factor 0.2 at epochs 150, 230, 240.