Knowledge Distillation by On-the-Fly Native Ensemble
Authors: xu lan, Xiatian Zhu, Shaogang Gong
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations show that ONE improves the generalisation performance of a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and Image Net, whilst having the computational efficiency advantages. |
| Researcher Affiliation | Collaboration | Xu Lan1, Xiatian Zhu2, and Shaogang Gong1 1Queen Mary University of London 2Vision Semantics Limited |
| Pseudocode | Yes | Algorithm 1 Knowledge Distillation by On-the-Fly Native Ensemble |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | CIFAR10 [25]: A natural images dataset... CIFAR100 [25]: A similar dataset as CIFAR10... SVHN: The Street View House Numbers (SVHN) dataset... Image Net: The 1,000-class dataset from ILSVRC 2012 [28]... |
| Dataset Splits | Yes | CIFAR10 [25]: A natural images dataset that contains 50,000/10,000 training/test samples... CIFAR100 [25]: A similar dataset as CIFAR10 that also contains 50,000/10,000 training/test images... SVHN: The Street View House Numbers (SVHN) dataset consists of 73,257/26,032 standard training/text images and an extra set of 531,131 training images. Image Net: The 1,000-class dataset from ILSVRC 2012 [28] provides 1.2 million images for training, and 50,000 for validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper states, 'We implemented all networks and model training procedures in Pytorch,' but does not provide a specific version number for Pytorch or any other software dependency. |
| Experiment Setup | Yes | We used the SGD with Nesterov momentum and set the momentum to 0.9. We deployed a standard learning rate schedule that drops from 0.1 to 0.01 at 50% training and to 0.001 at 75%. For the training budget, we set 300/40/90 epochs for CIFAR/SVHN/Image Net, respectively. We adopted a 3-branch ONE (m=2) design unless stated otherwise... Following [10], we set T = 3 in all the experiments. |