Knowledge Diffusion for Distillation

Authors: Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Diff KD is effective across various types of features and achieves state-of-theart performance consistently on image classification, object detection, and semantic segmentation tasks.
Researcher Affiliation Collaboration Tao Huang1,2 Yuan Zhang3 Mingkai Zheng1 Shan You2 Fei Wang4 Chen Qian2 Chang Xu1 1School of Computer Science, Faculty of Engineering, The University of Sydney 2Sense Time Research 3Peking University 4University of Science and Technology of China
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/hunto/Diff KD.
Open Datasets Yes The paper uses well-known public datasets such as Image Net, CIFAR-100, COCO dataset, and Cityscapes dataset, citing their original sources or related works.
Dataset Splits Yes The paper summarizes training strategies including epochs, batch size, learning rate, optimizer, and data augmentation in Table 1. It also presents validation results in tables, such as the 'Val' column for mIoU in Table 6 for Cityscapes dataset.
Hardware Specification Yes We run all the models on 8 V100 GPUs.
Software Dependencies No The paper mentions using MMDetection [4] and Torchvision [28] (a PyTorch package) but does not provide specific version numbers for these or other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes The paper provides specific experimental setup details including training strategies (Table 1: epochs, batch size, LR, optimizer, data augmentation), loss weights (e.g., λ1=λ2=λ3=1, Diff KD loss weight to 5), autoencoder latent channel sizes (1024 or 768), and details for semantic segmentation training (random flipping, scaling, crop size, SGD optimizer with momentum 0.9, polynomial LR scheduler).