DFD: Distilling the Feature Disparity Differently for Detectors

Authors: Kang Liu, Yingyi Zhang, Jingyun Zhang, Jinmin Li, Jun Wang, Shaoming Wang, Chun Yuan, Rizen Guo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate the effectiveness of our proposed DFD in achieving significant improvements. For instance, when applied to detectors based on Res Net50 such as Retina Net, Faster RCNN, and Rep Points, our method enhances their m AP from 37.4%, 38.4%, 38.6% to 41.7%, 42.4%, 42.7%, respectively. Our approach also demonstrates substantial improvements on YOLO and Vi T-based models.
Researcher Affiliation Collaboration 1Tsinghua University 2Tencent We Chat Pay Lab33 3Tencent Youtu Lab.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/luckin99/DFD.
Open Datasets Yes To evaluate the effectiveness of our method, we conduct comprehensive experiments on the COCO2017 dataset (Lin et al., 2014) using 8 Tesla V100 GPUs. This dataset comprises 80 object categories, and we use the default split of 120k images for training and 5k images for testing.
Dataset Splits No This dataset comprises 80 object categories, and we use the default split of 120k images for training and 5k images for testing.
Hardware Specification Yes To evaluate the effectiveness of our method, we conduct comprehensive experiments on the COCO2017 dataset (Lin et al., 2014) using 8 Tesla V100 GPUs.
Software Dependencies No Our implementation is based on MMDetection (Chen et al., 2019) with Pytorch (Paszke et al., 2019) framework, and we follow the default training settings of MMDetection. For YOLO experiments, we use MMYOLO(Contributors, 2022) framework.
Experiment Setup Yes We train all detectors for 24 epochs (2x schedule) or 12 epochs (1x schedule) with the stochastic gradient descent (SGD) optimizer. The optimizer is configured with a momentum of 0.9 and a weight decay of 0.0001. ...For all one-stage detectors, we use α = 0.000028 and β = 0.00001. For two-stage detectors, we use α = 0.00000035 and β = 0.0000001.