reproducibilityindex.ai

Improved Knowledge Distillation via Teacher Assistant

Authors: Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, Hassan Ghasemzadeh5191-5198

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretical analysis and extensive experiments on CIFAR-10,100 and Image Net datasets and on CNN and Res Net architectures substantiate the effectiveness of our proposed approach. We describe in this section the settings of our experiments. Datasets. We perform a set of experiments on two standard datasets CIFAR-10 and CIFAR-100 and one experiment on the large-scale Image Net dataset.
Researcher Affiliation	Collaboration	Seyed Iman Mirzadeh, 1 Washington State University, WA, USA 2Deep Mind, CA, USA 3D.E. Shaw, NY, USA 1{seyediman.mirzadeh, hassan.ghasemzadeh}@wsu.edu 2{farajtabar, anglili, nirlevine}@google.com 3akihiro.matsukawa@gmail.com
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Codes and Appendix are available at the following address: https://github.com/imirzadeh/Teacher-Assistant-Knowledge Distillation
Open Datasets	Yes	Datasets. We perform a set of experiments on two standard datasets CIFAR-10 and CIFAR-100 and one experiment on the large-scale Image Net dataset. The datasets consist of 32 32 RGB images. The task for all of them is to classify images into image categories. CIFAR-10, CIFAR100 and Image Net contain 10 and 100 and 1000 classes, respectively.
Dataset Splits	No	The paper mentions using a hyperparameter optimization toolkit but does not explicitly provide details about the validation dataset splits (percentages, counts, or methodology).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'Py Torch' and 'Microsoft-Research 2018' hyperparameter optimization toolkit but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For optimization, we used stochastic gradient descent with Nesterov momentum of 0.9 and learning rate of 0.1 for 150 epochs. For experiments on plain CNN networks, we used the same learning rate, while for Res Net training we decrease learning rate to 0.01 on epoch 80 and 0.001 on epoch 120. We also used weight decay with the value of 0.0001 for training Res Nets.