Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Isotonic Data Augmentation for Knowledge Distillation
Authors: Wanyun Cui, Sen Yan
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have verified on variant datasets and data augmentation techniques that our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by eliminating the rank violations. We show the classification accuracies of the standard knowledge distillation and our proposed isotonic data augmentation in Table 1. |
| Researcher Affiliation | Academia | Shanghai University of Finance and Economics EMAIL |
| Pseudocode | Yes | Algorithm 1 Adapted IRT. |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the availability of its source code. |
| Open Datasets | Yes | Datasets. We use CIFAR-100 [Krizhevsky et al., 2009], which contains 50k training images with 500 images per class and 10k test images. We also use Image Net, which contains 1.2 million images from 1K classes for training and 50K for validation |
| Dataset Splits | Yes | We use CIFAR-100 [Krizhevsky et al., 2009], which contains 50k training images with 500 images per class and 10k test images. We also use Image Net, which contains 1.2 million images from 1K classes for training and 50K for validation |
| Hardware Specification | Yes | Models for Image Net were trained on 4 Nvidia Tesla V100 GPUs. Models for CIFAR-100 were trained on a single Nvidia Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions using SGD as the optimizer and refers to various models (ResNet, GoogleNet, Bert, Distil Bert) and data augmentation techniques (Mixup, Cut Mix), but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x, CUDA x.x). |
| Experiment Setup | Yes | By default, we set β = 3, σ = 2, which are derived from grid search in {0.5, 1, 2, 3, 4, 5}. We set τ = 4.5, α = 0.95 from common practice. For Image Net, we train the student model for 100 epochs. We use SGD as the optimizer with initial learning rate is 0.1. We decay the learning rate by 0.1 at epoch 30, 60, 90. |