Knowledge Diffusion for Distillation
Authors: Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Diff KD is effective across various types of features and achieves state-of-theart performance consistently on image classification, object detection, and semantic segmentation tasks. |
| Researcher Affiliation | Collaboration | Tao Huang1,2 Yuan Zhang3 Mingkai Zheng1 Shan You2 Fei Wang4 Chen Qian2 Chang Xu1 1School of Computer Science, Faculty of Engineering, The University of Sydney 2Sense Time Research 3Peking University 4University of Science and Technology of China |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/hunto/Diff KD. |
| Open Datasets | Yes | The paper uses well-known public datasets such as Image Net, CIFAR-100, COCO dataset, and Cityscapes dataset, citing their original sources or related works. |
| Dataset Splits | Yes | The paper summarizes training strategies including epochs, batch size, learning rate, optimizer, and data augmentation in Table 1. It also presents validation results in tables, such as the 'Val' column for mIoU in Table 6 for Cityscapes dataset. |
| Hardware Specification | Yes | We run all the models on 8 V100 GPUs. |
| Software Dependencies | No | The paper mentions using MMDetection [4] and Torchvision [28] (a PyTorch package) but does not provide specific version numbers for these or other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The paper provides specific experimental setup details including training strategies (Table 1: epochs, batch size, LR, optimizer, data augmentation), loss weights (e.g., λ1=λ2=λ3=1, Diff KD loss weight to 5), autoencoder latent channel sizes (1024 or 768), and details for semantic segmentation training (random flipping, scaling, crop size, SGD optimizer with momentum 0.9, polynomial LR scheduler). |