DamoFD: Digging into Backbone Design on Face Detection

Authors: Yang Liu, Jiankang Deng, Fei Wang, Lei Shang, Xuansong Xie, Baigui Sun

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on the challenging Wider Face benchmark dataset and achieve dominant performance across a wide range of compute regimes. In particular, compared to the tiniest face detector SCRFD-0.5GF, our method is +2.5 % better in Average Precision (AP) score when using the same amount of FLOPs.
Researcher Affiliation Collaboration Yang Liu 1, Jiankang Deng 2, Fei Wang 1, Lei Shang 1, Xuansong Xie 1, Baigui Sun 1 * 1Alibaba Group 2Imperial College London
Pseudocode Yes Algorithm 1 Evolutionary Architecture Search
Open Source Code Yes The code is avaliable at https://github.com/ly19965/Easy Face/tree/ master/face_project/face_detection/Damo FD.
Open Datasets Yes In this paper, all experiments are conducted on the authoritative and challenging Wider Face Yang et al. (2016) dataset.
Dataset Splits Yes In each event, images are randomly separated into training (50%), validation (10%), and test (40%) sets.
Hardware Specification Yes We adopt the SGD optimizer (momentum 0.9, weight decay 5e-4) with a batch size of 8 × 4 and train on four Tesla V100s.
Software Dependencies No The paper mentions various components and techniques like “SGD optimizer”, “Generalised Focal Loss and DIoU Loss”, and “Group Normalization”, but does not specify software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes The population size and iteration in Algorithm 1 are set 256 and 96000, respectively. The convolution kernel size is searched from the set {3, 5, 7}. For the anchor setting, we tile anchors of {16, 32}, {64, 128}, {256, 512} on the feature maps with strides 8, 16, and 32, respectively. The optimization objectives of classification and localization branches are Generalised Focal Loss and DIoU Loss, respectively. For the optimization details, We adopt the SGD optimizer (momentum 0.9, weight decay 5e-4) with a batch size of 8 × 4 and train on four Tesla V100s. The initial learning rate is set to 1e-5, linearly warming up to 1e-2 within the first 3 epochs.