Intraclass clustering: an implicit learning ability that regularizes DNNs
Authors: Simon Carbonnelle, Christophe De Vleeschouwer
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we hypothesize that deep neural networks are regularized through their ability to extract meaningful clusters among the samples of a class. This constitutes an implicit form of regularization, as no explicit training mechanisms or supervision target such behaviour. To support our hypothesis, we design four different measures of intraclass clustering, based on the neuronand layer-level representations of the training data. We then show that these measures constitute accurate predictors of generalization performance across variations of a large set of hyperparameters (learning rate, batch size, optimizer, weight decay, dropout rate, data augmentation, network depth and width). |
| Researcher Affiliation | Academia | Simon Carbonnelle, Christophe De Vleeschouwer FNRS research fellows ICTEAM, Universit catholique de Louvain Louvain-La-Neuve, Belgium simon.carbonnelle@gmail.com, christophe.devleeschouwer@uclouvain.be |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | The datasets are CIFAR10, CIFAR100 and the coarse version of CIFAR100 with 20 superclasses (Krizhevsky & Hinton, 2009). |
| Dataset Splits | No | The paper mentions training and testing on CIFAR10/CIFAR100, which have standard splits, but it does not explicitly state the train/validation/test dataset splits (percentages, counts, or explicit reference to predefined splits) used in their experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud computing specifications) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | In order to build a set of models with a wide range of generalization performances, we vary hyperparameters that are known to be critical. Since varying multiple hyperparameters improves the identiļ¬cation of causal relationships, we vary 8 different hyperparameters: learning rate, batch size, optimizer (SGD or Adam (Kingma & Ba, 2015)), weight decay, dropout rate (Srivastava et al., 2014), data augmentation, network depth and width.The resulting hyperparameter values are as follows: 1. (Learning rate, Weight decay): {(0.01, 0.), (0.32, 0.), (0.1, 0.), (0.1, 4 10 5)} 2. Batch size: {100, 300} 3. Optimizer: {SGD, Adam} 4. (Dropout rate, Data augm.): {(0., true), (0., false), (0.2, false), (0.4, false)} 5. (Width factor, Depth factor): {( 1., 1.), ( 1.5, 1.), ( 1., 1.5))}We train models for 250 epochs, and reduce the learning rate by a factor 0.2 at epochs 150, 230, 240. |