Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

Authors: Ikuro Sato, Kohta Ishikawa, Guoqing Liu, Masayuki Tanaka

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Real-data experiments under more general conditions provide supportive evidences. We use the CIFAR-10 dataset, a 10-class image classification dataset having 5 × 10^4 training samples, and the CIFAR-100 dataset, a 100-class image classification dataset having the same number of samples (Krizhevsky & Hinton, 2009).
Researcher Affiliation Collaboration Ikuro Sato 1 Kohta Ishikawa 1 Guoqing Liu 1 Masayuki Tanaka 2 1Denso IT Laboratory, Inc., Japan 2National Institute of Advanced Industrial Science and Technology, Japan.
Pseudocode Yes Algorithm 1 Approximate minimization in Eq. (2)
Open Source Code No No statement regarding the release of source code or a link to a code repository was found.
Open Datasets Yes We use the CIFAR-10 dataset, a 10-class image classification dataset having 5 x 10^4 training samples, and the CIFAR-100 dataset, a 100-class image classification dataset having the same number of samples (Krizhevsky & Hinton, 2009).
Dataset Splits No In each training, we tested a couple of different initial learning rates and chose the best-performing one in the validation.
Hardware Specification No No specific hardware details (such as GPU or CPU models, memory, or cluster specifications) used for running experiments were mentioned in the paper.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes Training details. SGD with momentum is used in each baseline experiment. In each FOCA experiment, the feature-extractor part uses SGD with momentum, and the classifier part uses gradient descent with momentum. In each training, we tested a couple of different initial learning rates and chose the best-performing one in the validation. A manual learning rate scheduling is adopted; the learning rate is dropped by a fixed factor 1-3 times. The weak classifiers are randomly initialized each time by zero-mean Gaussian distribution with standard deviation 0.1 for both CIFAR-10 and -100. Cross entropy loss with softmax normalization and Re LU activation (Nair & Hinton, 2010) are used in every case. No data augmentation is adopted. The batch size b used in the weak-classifier training is 100 for the CIFAR10 and 1000 for the CIFAR-100 experiments. The number of updates to generate θ is 32 for the CIFAR-10 and 64 for the CIFAR-100 experiments. Max-norm regularization (Srivastava et al., 2014) is used for the FOCA training, to stabilize the training. We found that the FOCA training can be made even more stable when updating the featureextractor parameters u times for a given weak classifier parameters. We used this trick with u = 8 in the CIFAR100 experiments.