reproducibilityindex.ai

Isotonic Data Augmentation for Knowledge Distillation

Authors: Wanyun Cui, Sen Yan

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have veriﬁed on variant datasets and data augmentation techniques that our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by eliminating the rank violations. We show the classiﬁcation accuracies of the standard knowledge distillation and our proposed isotonic data augmentation in Table 1.
Researcher Affiliation	Academia	Shanghai University of Finance and Economics cui.wanyun@sufe.edu.cn
Pseudocode	Yes	Algorithm 1 Adapted IRT.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of its source code.
Open Datasets	Yes	Datasets. We use CIFAR-100 [Krizhevsky et al., 2009], which contains 50k training images with 500 images per class and 10k test images. We also use Image Net, which contains 1.2 million images from 1K classes for training and 50K for validation
Dataset Splits	Yes	We use CIFAR-100 [Krizhevsky et al., 2009], which contains 50k training images with 500 images per class and 10k test images. We also use Image Net, which contains 1.2 million images from 1K classes for training and 50K for validation
Hardware Specification	Yes	Models for Image Net were trained on 4 Nvidia Tesla V100 GPUs. Models for CIFAR-100 were trained on a single Nvidia Tesla V100 GPU.
Software Dependencies	No	The paper mentions using SGD as the optimizer and refers to various models (ResNet, GoogleNet, Bert, Distil Bert) and data augmentation techniques (Mixup, Cut Mix), but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x, CUDA x.x).
Experiment Setup	Yes	By default, we set β = 3, σ = 2, which are derived from grid search in {0.5, 1, 2, 3, 4, 5}. We set τ = 4.5, α = 0.95 from common practice. For Image Net, we train the student model for 100 epochs. We use SGD as the optimizer with initial learning rate is 0.1. We decay the learning rate by 0.1 at epoch 30, 60, 90.