What Makes a "Good" Data Augmentation in Knowledge Distillation - A Statistical Perspective
Authors: Huan Wang, Suhas Lohit, Michael N. Jones, Yun Fu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical studies support our claims and demonstrate how we can harvest considerable performance gains simply by using a better DA scheme in knowledge distillation. Presenting such a theoretically sound metric and empirically validating its effectiveness is the goal of this paper. |
| Researcher Affiliation | Collaboration | 1Northeastern University, Boston, MA 2MERL, Cambridge, MA This paper originates from Huan s summer internship work at MERL. |
| Pseudocode | No | The paper describes its methods and proposed schemes in prose (e.g., "Concretely, given a batch of data, we first apply Cut Mix...", "The idea is partly inspired by active learning..."), but does not include any formally structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project: http://huanwang.tech/Good-DA-in-KD. We include the code link. |
| Open Datasets | Yes | We evaluate our method primarily on the CIFAR100 [21] and Tiny Image Net* datasets. CIFAR100 has 100 object classes (32 32 RGB images). Each class has 500 images for training and 100 images for testing. Tiny Image Net is a small version of Image Net [10] with 200 classes (64 64 RGB images). Each class has 500 images for training, 50 for validation and 50 for testing. *https://tiny-imagenet.herokuapp.com/ |
| Dataset Splits | Yes | CIFAR100 has 100 object classes (32 32 RGB images). Each class has 500 images for training and 100 images for testing. Tiny Image Net is a small version of Image Net [10] with 200 classes (64 64 RGB images). Each class has 500 images for training, 50 for validation and 50 for testing. |
| Hardware Specification | No | The paper states: "We use Py Torch [31] to conduct all our experiments." However, it does not specify any details about the hardware (e.g., GPU model, CPU type) used for these experiments. |
| Software Dependencies | No | The paper mentions: "We use Py Torch [31] to conduct all our experiments." but does not specify any version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | The temperature τ of knowledge distillation is set to 4 following CRD [43]. Loss weight α = 0.9 (Eq. (1)). For CIFAR100 and Tiny Image Net, training batch size is 64; the original number of total training epochs is 240, with learning rate (LR) decayed at epoch 150, 180, and 210 by multiplier 0.1. The initial LR is 0.05. For prolonged training, we train for 480 epochs instead of 960 to save time. |