Structural Knowledge Distillation for Object Detection

Authors: Philip de Rijk, Lukas Schneider, Marius Cordts, Dariu Gavrila

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on MSCOCO [16] demonstrate the effectiveness of our method across different training schemes and architectures.
Researcher Affiliation Collaboration Philip de Rijk1,2 Lukas Schneider2 Marius Cordts2 Dariu M. Gavrila1 1TU Delft 2Mercedes-Benz AG
Pseudocode Yes An example implementation of KD loss between teacher and student features is shown below. Omitting the import and using a library such as Kornia [22], a change from ℓ2 to ℓSSIM only requires a change in one line of code. l2 implementation... ssim implementation...
Open Source Code No The paper states 'All our experiments are based on publicly available frameworks [4, 29] and datasets [16].' and provides code snippets, but it does not explicitly state that the source code for their *entire methodology* is released, nor does it provide a direct link to a repository for their specific implementation.
Open Datasets Yes We assess the performance of ℓSSIM on the MSCOCO [16] validation dataset. ... Microsoft COCO: Common objects in context. ECCV, 2014.
Dataset Splits Yes We assess the performance of ℓSSIM on the MSCOCO [16] validation dataset.
Hardware Specification Yes We conduct our experiments in Pytorch [20] using the MMDetection2 [4] framework on a Nvidia RTX8000 GPU with 48GB of memory.
Software Dependencies No The paper mentions software frameworks like Pytorch [20], MMDetection2 [4], Kornia [22], and Detectron2 [29] but does not provide specific version numbers for these software components.
Experiment Setup Yes Each model is trained using SGD optimization with momentum 0.9, weight decay 1e-4 and batch size 8. The learning rate is set at 0.01 (RN) / 0.02 (FRCNN) and decreased tenfold at step 8 and 11, for a total of 12 epochs. We additionally implement batch normalization layers after each convolutional layer, and use focal loss [18] with γfl = 2.0 and αfl = 0.25. The input images are resized to minimum spatial dimensions of 800 while retaining the original ratios, and we add padding to both fulfill the stride requirements and retain equal dimensionality across each batch. Finally the images are randomly flipped with p = 0.5 and normalized.