Breaking Inter-Layer Co-Adaptation by Classifier Anonymization
Authors: Ikuro Sato, Kohta Ishikawa, Guoqing Liu, Masayuki Tanaka
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Real-data experiments under more general conditions provide supportive evidences. We use the CIFAR-10 dataset, a 10-class image classification dataset having 5 × 10^4 training samples, and the CIFAR-100 dataset, a 100-class image classification dataset having the same number of samples (Krizhevsky & Hinton, 2009). |
| Researcher Affiliation | Collaboration | Ikuro Sato 1 Kohta Ishikawa 1 Guoqing Liu 1 Masayuki Tanaka 2 1Denso IT Laboratory, Inc., Japan 2National Institute of Advanced Industrial Science and Technology, Japan. |
| Pseudocode | Yes | Algorithm 1 Approximate minimization in Eq. (2) |
| Open Source Code | No | No statement regarding the release of source code or a link to a code repository was found. |
| Open Datasets | Yes | We use the CIFAR-10 dataset, a 10-class image classification dataset having 5 x 10^4 training samples, and the CIFAR-100 dataset, a 100-class image classification dataset having the same number of samples (Krizhevsky & Hinton, 2009). |
| Dataset Splits | No | In each training, we tested a couple of different initial learning rates and chose the best-performing one in the validation. |
| Hardware Specification | No | No specific hardware details (such as GPU or CPU models, memory, or cluster specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | Training details. SGD with momentum is used in each baseline experiment. In each FOCA experiment, the feature-extractor part uses SGD with momentum, and the classifier part uses gradient descent with momentum. In each training, we tested a couple of different initial learning rates and chose the best-performing one in the validation. A manual learning rate scheduling is adopted; the learning rate is dropped by a fixed factor 1-3 times. The weak classifiers are randomly initialized each time by zero-mean Gaussian distribution with standard deviation 0.1 for both CIFAR-10 and -100. Cross entropy loss with softmax normalization and Re LU activation (Nair & Hinton, 2010) are used in every case. No data augmentation is adopted. The batch size b used in the weak-classifier training is 100 for the CIFAR10 and 1000 for the CIFAR-100 experiments. The number of updates to generate θ is 32 for the CIFAR-10 and 64 for the CIFAR-100 experiments. Max-norm regularization (Srivastava et al., 2014) is used for the FOCA training, to stabilize the training. We found that the FOCA training can be made even more stable when updating the featureextractor parameters u times for a given weak classifier parameters. We used this trick with u = 8 in the CIFAR100 experiments. |