DFD: Distilling the Feature Disparity Differently for Detectors
Authors: Kang Liu, Yingyi Zhang, Jingyun Zhang, Jinmin Li, Jun Wang, Shaoming Wang, Chun Yuan, Rizen Guo
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate the effectiveness of our proposed DFD in achieving significant improvements. For instance, when applied to detectors based on Res Net50 such as Retina Net, Faster RCNN, and Rep Points, our method enhances their m AP from 37.4%, 38.4%, 38.6% to 41.7%, 42.4%, 42.7%, respectively. Our approach also demonstrates substantial improvements on YOLO and Vi T-based models. |
| Researcher Affiliation | Collaboration | 1Tsinghua University 2Tencent We Chat Pay Lab33 3Tencent Youtu Lab. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/luckin99/DFD. |
| Open Datasets | Yes | To evaluate the effectiveness of our method, we conduct comprehensive experiments on the COCO2017 dataset (Lin et al., 2014) using 8 Tesla V100 GPUs. This dataset comprises 80 object categories, and we use the default split of 120k images for training and 5k images for testing. |
| Dataset Splits | No | This dataset comprises 80 object categories, and we use the default split of 120k images for training and 5k images for testing. |
| Hardware Specification | Yes | To evaluate the effectiveness of our method, we conduct comprehensive experiments on the COCO2017 dataset (Lin et al., 2014) using 8 Tesla V100 GPUs. |
| Software Dependencies | No | Our implementation is based on MMDetection (Chen et al., 2019) with Pytorch (Paszke et al., 2019) framework, and we follow the default training settings of MMDetection. For YOLO experiments, we use MMYOLO(Contributors, 2022) framework. |
| Experiment Setup | Yes | We train all detectors for 24 epochs (2x schedule) or 12 epochs (1x schedule) with the stochastic gradient descent (SGD) optimizer. The optimizer is configured with a momentum of 0.9 and a weight decay of 0.0001. ...For all one-stage detectors, we use α = 0.000028 and β = 0.00001. For two-stage detectors, we use α = 0.00000035 and β = 0.0000001. |