Improved Feature Distillation via Projector Ensemble
Authors: Yudong Chen, Sen Wang, Jiajun Liu, Xuwei Xu, Frank de Hoog, Zi Huang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on different datasets with a series of teacher-student pairs illustrate the effectiveness of the proposed method. |
| Researcher Affiliation | Academia | 1The University of Queensland 2CSIRO Data61 {yudong.chen,sen.wang,xuwei.xu}@uq.edu.au {jiajun.liu,frank.dehoog}@csiro.au huang@itee.uq.edu.au |
| Pseudocode | Yes | Algorithm 1 Improved Feature Distillation via Projector Ensemble. |
| Open Source Code | Yes | Code is available at https://github.com/chenyd7/PEFD. |
| Open Datasets | Yes | Datasets. Two benchmark datasets are used for evaluation in our experiments. Image Net [25] contains approximately 1.28 million training images and 50,000 validation images from 1,000 classes. ... CIFAR-100 [18] dataset includes 50,000 training images and 10,000 testing images from 100 classes. |
| Dataset Splits | Yes | Image Net [25] contains approximately 1.28 million training images and 50,000 validation images from 1,000 classes. The validation images are used for testing. CIFAR-100 [18] dataset includes 50,000 training images and 10,000 testing images from 100 classes. |
| Hardware Specification | Yes | All the experiments are performed on an NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions software like 'PyTorch' implicitly through a GitHub link for settings, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Following the settings of previous methods2, the batch size, epochs, learning rate decay rate and weight decay rate are 256/64, 100/240, 0.1/0.1, and 0.0001/0.0005, respectively on Image Net/CIFAR-100. The initial learning rate is 0.1 on Image Net, and 0.01 for Mobile Net V2, 0.05 for the other students on CIFAR-100. Besides, the learning rate drops at every 30 epochs on Image Net and drops at 150, 180, 210 epochs on CIFAR-100. The optimizer is Stochastic Gradient Descent (SGD) with momentum 0.9. |