reproducibilityindex.ai

Peer Collaborative Learning for Online Knowledge Distillation

Authors: Guile Wu, Shaogang Gong10302-10310

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on CIFAR-10, CIFAR-100 and Image Net show that the proposed method signiﬁcantly improves the generalisation of various backbone networks and outperforms the state-of-the-art methods.
Researcher Affiliation	Academia	Guile Wu, Shaogang Gong Queen Mary University of London guile.wu@qmul.ac.uk, s.gong@qmul.ac.uk
Pseudocode	Yes	Algorithm 1 Peer Collaborative Learning for Online KD. Input: Training data {(xi, yi)}n i=1. Output: A trained target model {θt l, θt h,1}, and a trained ensemble model {θt l, θt h,j}m j=1.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of the source code for the proposed method.
Open Datasets	Yes	Datasets. We used three image classiﬁcation benchmarks for evaluation: (1) CIFAR-10 (Krizhevsky and Hinton 2009) contains 60000 images in 10 classes, with 5000 training images and 1000 test images per class. (2) CIFAR100 (Krizhevsky and Hinton 2009) consists of 60000 images in 100 classes, with 500 training images and 100 test images per class. (3) Image Net ILSVRC 2012 (Russakovsky et al. 2015) contains 1.2 million training images and 50000 validation images in 1000 classes.
Dataset Splits	Yes	CIFAR-10 contains 60000 images in 10 classes, with 5000 training images and 1000 test images per class. (2) CIFAR100 consists of 60000 images in 100 classes, with 500 training images and 100 test images per class. (3) Image Net ILSVRC 2012 contains 1.2 million training images and 50000 validation images in 1000 classes.
Hardware Specification	Yes	Our models were implemented with Python 3.6 and Py Torch 0.4, and trained on TESLA V100 GPU (32GB).
Software Dependencies	Yes	Our models were implemented with Python 3.6 and Py Torch 0.4
Experiment Setup	Yes	We set m=3 peers in the multi-branch architecture. We used SGD as the optimiser with Nesterov momentum 0.9 and weight decay 5e-4. We trained the network for Epochmax=300 epochs on CIFAR-10/100 and 90 epochs on Image Net. We set the initial learning rate to 0.1, which decayed to {0.01, 0.001} at {150, 225} epochs on CIFAR10/100 and at {30, 60} epochs on Image Net. We set the batch size to 128, the temperature T=3, α=80 for ramp-up weighting, β=0.999 to learn temporal mean models, λ=1.0 for CIFAR-10/100 and λ=0.1 for Image Net.