Structural Knowledge Distillation for Object Detection
Authors: Philip de Rijk, Lukas Schneider, Marius Cordts, Dariu Gavrila
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on MSCOCO [16] demonstrate the effectiveness of our method across different training schemes and architectures. |
| Researcher Affiliation | Collaboration | Philip de Rijk1,2 Lukas Schneider2 Marius Cordts2 Dariu M. Gavrila1 1TU Delft 2Mercedes-Benz AG |
| Pseudocode | Yes | An example implementation of KD loss between teacher and student features is shown below. Omitting the import and using a library such as Kornia [22], a change from ℓ2 to ℓSSIM only requires a change in one line of code. l2 implementation... ssim implementation... |
| Open Source Code | No | The paper states 'All our experiments are based on publicly available frameworks [4, 29] and datasets [16].' and provides code snippets, but it does not explicitly state that the source code for their *entire methodology* is released, nor does it provide a direct link to a repository for their specific implementation. |
| Open Datasets | Yes | We assess the performance of ℓSSIM on the MSCOCO [16] validation dataset. ... Microsoft COCO: Common objects in context. ECCV, 2014. |
| Dataset Splits | Yes | We assess the performance of ℓSSIM on the MSCOCO [16] validation dataset. |
| Hardware Specification | Yes | We conduct our experiments in Pytorch [20] using the MMDetection2 [4] framework on a Nvidia RTX8000 GPU with 48GB of memory. |
| Software Dependencies | No | The paper mentions software frameworks like Pytorch [20], MMDetection2 [4], Kornia [22], and Detectron2 [29] but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Each model is trained using SGD optimization with momentum 0.9, weight decay 1e-4 and batch size 8. The learning rate is set at 0.01 (RN) / 0.02 (FRCNN) and decreased tenfold at step 8 and 11, for a total of 12 epochs. We additionally implement batch normalization layers after each convolutional layer, and use focal loss [18] with γfl = 2.0 and αfl = 0.25. The input images are resized to minimum spatial dimensions of 800 while retaining the original ratios, and we add padding to both fulfill the stride requirements and retain equal dimensionality across each batch. Finally the images are randomly flipped with p = 0.5 and normalized. |