Function-Consistent Feature Distillation

Authors: Dongyang Liu, Meina Kan, Shiguang Shan, Xilin CHEN

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on image classification and object detection demonstrate the superiority of FCFD to existing methods.
Researcher Affiliation Academia 1 Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS 2 University of Chinese Academy of Sciences 3 Peng Cheng Laboratory
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Our codes are available at https://github.com/LiuDongyang6/FCFD.
Open Datasets Yes CIFAR100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), MS-COCO (Lin et al., 2014)
Dataset Splits Yes The CIFAR100 (Krizhevsky et al., 2009) dataset consists of 60K images from 100 categories with size of 32 32. In the standard protocol, 50k images are used for training and 10k for testing. The Image Net (Deng et al., 2009) dataset consists of 1.28 million training images and 50k validation images from 1000 categories.
Hardware Specification Yes We run experiments on one Tesla-V100 GPU with batch size 256 and initial learning rate 0.1. All experiments are conducted on one Tesla-V100 GPU.
Software Dependencies No Automatic Mixed Precision (AMP) provided by Py Torch is used for acceleration. Our implementation is based on Detectron2 (Wu et al., 2019). (No specific version numbers for PyTorch, Detectron2, or MMDetection are provided.)
Experiment Setup Yes For CIFAR100... train all the models for 240 epochs, and the learning rate is decayed by a factor of 10 at 150, 180, and 210 epochs, respectively. The initial learning rate is 0.01 for Mobile Net V2, Shuffle Net, Shuffle Net V2, and is 0.05 for other student models. The batch size is 64, and SGD optimizer with a 0.0005 weight decay and 0.9 momentum is adopted.