Knowledge Distillation from A Stronger Teacher

Authors: Tao Huang, Shan You, Fei Wang, Chen Qian, Chang Xu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on benchmark datasets to verify our effectiveness on various tasks, including image classification, object detection, and semantic segmentation. and extensive experiments demonstrate that it adapts well to various architectures, model sizes and training strategies, and can achieve state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks.
Researcher Affiliation Collaboration 1Sense Time Research 2School of Computer Science, Faculty of Engineering, The University of Sydney 3University of Science and Technology of China
Pseudocode No No structured pseudocode or algorithm blocks were found in the main paper. While the paper mentions that the method "can be implemented with only several lines of code (see Appendix A.1)", the appendix content is not provided, and no such block exists in the main text.
Open Source Code Yes Code is available at: https://github.com/hunto/DIST_KD.
Open Datasets Yes Extensive experiments are conducted on benchmark datasets to verify our effectiveness on various tasks, including image classification, object detection, and semantic segmentation. We use the standard training strategy (B1) on Image Net and conduct experiments on MS COCO object detection dataset [25] and Cityscapes dataset. CIFAR-100 is also used.
Dataset Splits Yes We train Res Net-18 and Res Net-50 standalone with strategy B1 and strategy B2... then compare their discrepancy using KL divergence... on the predicted probabilities Y. (Figure 2: Discrepancy between the predictions of models trained standalone with different strategies on Image Net validation set). Also, Table 1: Training strategies on image classification tasks. This implies standard, well-defined splits for these benchmark datasets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were found in the paper's text.
Software Dependencies No The paper mentions software like Torchvision and PyTorch, but does not provide specific version numbers for these or any other key software components used in the experiments.
Experiment Setup Yes Table 1: Training strategies on image classification tasks. BS: batch size; LR: learning rate; WD: weight decay; LS: label smoothing; EMA: model exponential moving average; RA: Rand Augment [9]; RE: random erasing; CJ: color jitter. This table specifies Epochs, Total BS, Initial LR, Optimizer, WD, LS, EMA, LR scheduler, and Data augmentation.