reproducibilityindex.ai

PoF: Post-Training of Feature Extractor for Improving Generalization

Authors: Ikuro Sato, Yamada Ryota, Masayuki Tanaka, Nakamasa Inoue, Rei Kawakami

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted various image classiﬁcation experiments on CIFAR-10, CIFAR-100 (Krizhevskyf & Hinton, 2009), SVHN (Netzer et al., 2011), and Fashion-MNIST (Xiao et al., 2017).
Researcher Affiliation	Collaboration	1School of Computing, Tokyo Institute of Technology, Japan 2Denso IT Laboratory, inc., Japan.
Pseudocode	Yes	Algorithm 1 Post-training of Feature Extractor (Po F)
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	We conducted various image classiﬁcation experiments on CIFAR-10, CIFAR-100 (Krizhevskyf & Hinton, 2009), SVHN (Netzer et al., 2011), and Fashion-MNIST (Xiao et al., 2017).
Dataset Splits	Yes	We used standard training/validation/testing split for all datasets, but the 530K extra images were used in addition to the standard training data of SVHN.
Hardware Specification	Yes	The computing environment used in all experiments is 4 compute nodes, each equipped with 4 NVIDIA A100 GPUs, i.e., totally 16 GPUs were used in parallel.
Software Dependencies	No	The paper mentions using 'Nesterov Accelerated Gradient' as an optimizer, but it does not specify software or library versions (e.g., 'PyTorch 1.9', 'TensorFlow 2.x') that are necessary for reproducibility.
Experiment Setup	Yes	The network was trained for 250 epochs with batch size of 256. The learning rate was initialized to 0.1 (0.01 for SVHN) and was multiplied by a factor of 0.2 at 60-th, 120-th, 160-th, and 200-th epochs. We used the Nesterov Accelerated Gradient with momentum rate of 0.9 and weight decay rate of 5e-4. With SAM, ρ, the range of the perturbation, was set to 0.05 (0.01 for SVHN). Weights in the feature extractors use He-initialization, and those in classiﬁers were initialized with a normal distribution N(0, 0.12). The network was trained with SAM (ρ = 0.05) for the ﬁrst 200 epochs. Then, the feature extractor was post-trained with Po F for additional 50 epochs with batch size of 256 and learning rate of 3e-5, with the Nestrov Accelerated Gradient having the same parameters with those in SGD. The batch size for generating weak classiﬁers was 32. The expansion factor γ in Eq. (5) was randomly sampled at each iteration from a predeﬁned range, i.e., γ [0, 2] in all experiments. All results used basic data augmentations (horizontal ﬂip, padding by four pixels, and random crop), and cutout with 16x16 pixels was additionally used for the results of CIFAR-{10, 100}.