DetKDS: Knowledge Distillation Search for Object Detectors

Authors: Lujun Li, Yufan Bao, Peijie Dong, Chuanguang Yang, Anggeng Li, Wenhan Luo, Qifeng Liu, Wei Xue, Yike Guo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on different detectors demonstrate that Det KDS outperforms state-of-the-art methods in detection and instance segmentation tasks. Extensive experiments are conducted to elaborate on the effectiveness of Det KDS.
Researcher Affiliation Academia 1The Hong Kong University of Science and Technology 2The Hong Kong University of Science and Technology (Guang Zhou) 3Institute of Computing Technology, Chinese Academy of Sciences.
Pseudocode Yes Algorithm 1 Divide-and-conquer Evolution in Det KDS
Open Source Code Yes Code at: https://github.com/lliai/Det KDS.
Open Datasets Yes We evaluate our method on the COCO dataset (Lin et al., 2014), which contains 80 object classes. After searching, we train student detectors with the best distiller on the full COCO dataset, including 120K train images.
Dataset Splits Yes For search settings, we utilize all searches on the subsets of COCO training set (i.e., mini-COCO), which consists of 25K training images and 5K validation images.
Hardware Specification Yes Following the same training settings as FGD, we develop our experiments using 8 NVIDIA V100 GPUs with a mini-batch of two images per GPU.
Software Dependencies No The paper mentions using an 'SGD optimizer' but does not specify any software libraries, frameworks, or their version numbers that would be necessary for replication.
Experiment Setup Yes We configure 20 iterations of parallel searching for individual losses and 40 iterations for combined weights for multiple losses. We set training to one epoch for each search iteration. For EA settings, we set (P, T , r, k) in Alg. 1 as (20, 40, 0.9, 5). After searching, we train student detectors with the best distiller on the full COCO dataset, including 120K train images. Following the same training settings as FGD, we develop our experiments using 8 NVIDIA V100 GPUs with a mini-batch of two images per GPU. We train all the detectors for 24 epochs with SGD optimizer, in which the momentum is 0.9, and the weight decay is 0.0001.