reproducibilityindex.ai

Self-boosting for Feature Distillation

Authors: Yulong Pei, Yanyun Qu, Junping Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.
Researcher Affiliation	Academia	1Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Fujian, China 2Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China
Pseudocode	No	The paper describes its methods using text and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	We used a reference implementation: https://github.com/ Hobbit Long/Rep Distiller.git (This states they used a reference implementation, not that their code is open-source.)
Open Datasets	Yes	CIFAR-100 [Krizhevsky et al., 2009] is a commonly used small dataset for classiﬁcation, which contains 60, 000 RGB color images within 100 classes (50, 000 training images and 10, 000 test images) with a resolution of 32 32. CUB200 [Wah et al., 2011] is a dataset for ﬁne-grained recognition, which consists of 11, 788 images of different birds. Image Net [Russakovsky et al., 2015] is a large-scale classiﬁcation benchmark which has around 1.2 million images in 1, 000 classes.
Dataset Splits	No	CIFAR-100...contains 60, 000 RGB color images within 100 classes (50, 000 training images and 10, 000 test images)... (This text only specifies training and test images, not a separate validation split.)
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used for running its experiments.
Software Dependencies	No	The paper mentions using a 'stochastic gradient descent (SGD) optimizer' but does not specify software packages or their version numbers.
Experiment Setup	Yes	For all networks, we use stochastic gradient descent (SGD) optimizer with momentum 0.9 and weight decay 5 10 4. On CIFAR-100, models are trained for 240 epochs with an initial learning rate of 0.05 and divided by 10 at epoch 150, 180 and 210, and standard data augmentation schemes (padding 4 pixels, random cropping, random horizontal ﬂipping) are carried out. On Image Net and CUB-200, the number of total epochs is 100 and 120 respectively, the learning rate is dropped by 0.1 per 30 epochs, and we perform random cropping and horizontal ﬂipping as data augmentation.