Self-boosting for Feature Distillation

Authors: Yulong Pei, Yanyun Qu, Junping Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.
Researcher Affiliation Academia 1Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Fujian, China 2Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China
Pseudocode No The paper describes its methods using text and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code No We used a reference implementation: https://github.com/ Hobbit Long/Rep Distiller.git (This states they used a reference implementation, not that their code is open-source.)
Open Datasets Yes CIFAR-100 [Krizhevsky et al., 2009] is a commonly used small dataset for classification, which contains 60, 000 RGB color images within 100 classes (50, 000 training images and 10, 000 test images) with a resolution of 32 32. CUB200 [Wah et al., 2011] is a dataset for fine-grained recognition, which consists of 11, 788 images of different birds. Image Net [Russakovsky et al., 2015] is a large-scale classification benchmark which has around 1.2 million images in 1, 000 classes.
Dataset Splits No CIFAR-100...contains 60, 000 RGB color images within 100 classes (50, 000 training images and 10, 000 test images)... (This text only specifies training and test images, not a separate validation split.)
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used for running its experiments.
Software Dependencies No The paper mentions using a 'stochastic gradient descent (SGD) optimizer' but does not specify software packages or their version numbers.
Experiment Setup Yes For all networks, we use stochastic gradient descent (SGD) optimizer with momentum 0.9 and weight decay 5 10 4. On CIFAR-100, models are trained for 240 epochs with an initial learning rate of 0.05 and divided by 10 at epoch 150, 180 and 210, and standard data augmentation schemes (padding 4 pixels, random cropping, random horizontal flipping) are carried out. On Image Net and CUB-200, the number of total epochs is 100 and 120 respectively, the learning rate is dropped by 0.1 per 30 epochs, and we perform random cropping and horizontal flipping as data augmentation.