Knowledge Distillation from A Stronger Teacher
Authors: Tao Huang, Shan You, Fei Wang, Chen Qian, Chang Xu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on benchmark datasets to verify our effectiveness on various tasks, including image classification, object detection, and semantic segmentation. and extensive experiments demonstrate that it adapts well to various architectures, model sizes and training strategies, and can achieve state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. |
| Researcher Affiliation | Collaboration | 1Sense Time Research 2School of Computer Science, Faculty of Engineering, The University of Sydney 3University of Science and Technology of China |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the main paper. While the paper mentions that the method "can be implemented with only several lines of code (see Appendix A.1)", the appendix content is not provided, and no such block exists in the main text. |
| Open Source Code | Yes | Code is available at: https://github.com/hunto/DIST_KD. |
| Open Datasets | Yes | Extensive experiments are conducted on benchmark datasets to verify our effectiveness on various tasks, including image classification, object detection, and semantic segmentation. We use the standard training strategy (B1) on Image Net and conduct experiments on MS COCO object detection dataset [25] and Cityscapes dataset. CIFAR-100 is also used. |
| Dataset Splits | Yes | We train Res Net-18 and Res Net-50 standalone with strategy B1 and strategy B2... then compare their discrepancy using KL divergence... on the predicted probabilities Y. (Figure 2: Discrepancy between the predictions of models trained standalone with different strategies on Image Net validation set). Also, Table 1: Training strategies on image classification tasks. This implies standard, well-defined splits for these benchmark datasets. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were found in the paper's text. |
| Software Dependencies | No | The paper mentions software like Torchvision and PyTorch, but does not provide specific version numbers for these or any other key software components used in the experiments. |
| Experiment Setup | Yes | Table 1: Training strategies on image classification tasks. BS: batch size; LR: learning rate; WD: weight decay; LS: label smoothing; EMA: model exponential moving average; RA: Rand Augment [9]; RE: random erasing; CJ: color jitter. This table specifies Epochs, Total BS, Initial LR, Optimizer, WD, LS, EMA, LR scheduler, and Data augmentation. |