Peer Collaborative Learning for Online Knowledge Distillation
Authors: Guile Wu, Shaogang Gong10302-10310
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CIFAR-10, CIFAR-100 and Image Net show that the proposed method significantly improves the generalisation of various backbone networks and outperforms the state-of-the-art methods. |
| Researcher Affiliation | Academia | Guile Wu, Shaogang Gong Queen Mary University of London guile.wu@qmul.ac.uk, s.gong@qmul.ac.uk |
| Pseudocode | Yes | Algorithm 1 Peer Collaborative Learning for Online KD. Input: Training data {(xi, yi)}n i=1. Output: A trained target model {θt l, θt h,1}, and a trained ensemble model {θt l, θt h,j}m j=1. |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the availability of the source code for the proposed method. |
| Open Datasets | Yes | Datasets. We used three image classification benchmarks for evaluation: (1) CIFAR-10 (Krizhevsky and Hinton 2009) contains 60000 images in 10 classes, with 5000 training images and 1000 test images per class. (2) CIFAR100 (Krizhevsky and Hinton 2009) consists of 60000 images in 100 classes, with 500 training images and 100 test images per class. (3) Image Net ILSVRC 2012 (Russakovsky et al. 2015) contains 1.2 million training images and 50000 validation images in 1000 classes. |
| Dataset Splits | Yes | CIFAR-10 contains 60000 images in 10 classes, with 5000 training images and 1000 test images per class. (2) CIFAR100 consists of 60000 images in 100 classes, with 500 training images and 100 test images per class. (3) Image Net ILSVRC 2012 contains 1.2 million training images and 50000 validation images in 1000 classes. |
| Hardware Specification | Yes | Our models were implemented with Python 3.6 and Py Torch 0.4, and trained on TESLA V100 GPU (32GB). |
| Software Dependencies | Yes | Our models were implemented with Python 3.6 and Py Torch 0.4 |
| Experiment Setup | Yes | We set m=3 peers in the multi-branch architecture. We used SGD as the optimiser with Nesterov momentum 0.9 and weight decay 5e-4. We trained the network for Epochmax=300 epochs on CIFAR-10/100 and 90 epochs on Image Net. We set the initial learning rate to 0.1, which decayed to {0.01, 0.001} at {150, 225} epochs on CIFAR10/100 and at {30, 60} epochs on Image Net. We set the batch size to 128, the temperature T=3, α=80 for ramp-up weighting, β=0.999 to learn temporal mean models, λ=1.0 for CIFAR-10/100 and λ=0.1 for Image Net. |