Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Isotonic Data Augmentation for Knowledge Distillation

Authors: Wanyun Cui, Sen Yan

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have verified on variant datasets and data augmentation techniques that our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by eliminating the rank violations. We show the classification accuracies of the standard knowledge distillation and our proposed isotonic data augmentation in Table 1.
Researcher Affiliation Academia Shanghai University of Finance and Economics EMAIL
Pseudocode Yes Algorithm 1 Adapted IRT.
Open Source Code No The paper does not provide a direct link or explicit statement about the availability of its source code.
Open Datasets Yes Datasets. We use CIFAR-100 [Krizhevsky et al., 2009], which contains 50k training images with 500 images per class and 10k test images. We also use Image Net, which contains 1.2 million images from 1K classes for training and 50K for validation
Dataset Splits Yes We use CIFAR-100 [Krizhevsky et al., 2009], which contains 50k training images with 500 images per class and 10k test images. We also use Image Net, which contains 1.2 million images from 1K classes for training and 50K for validation
Hardware Specification Yes Models for Image Net were trained on 4 Nvidia Tesla V100 GPUs. Models for CIFAR-100 were trained on a single Nvidia Tesla V100 GPU.
Software Dependencies No The paper mentions using SGD as the optimizer and refers to various models (ResNet, GoogleNet, Bert, Distil Bert) and data augmentation techniques (Mixup, Cut Mix), but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x, CUDA x.x).
Experiment Setup Yes By default, we set β = 3, σ = 2, which are derived from grid search in {0.5, 1, 2, 3, 4, 5}. We set τ = 4.5, α = 0.95 from common practice. For Image Net, we train the student model for 100 epochs. We use SGD as the optimizer with initial learning rate is 0.1. We decay the learning rate by 0.1 at epoch 30, 60, 90.