KGDet: Keypoint-Guided Fashion Detection

Authors: Shenhan Qian, Dongze Lian, Binqiang Zhao, Tong Liu, Bohui Zhu, Hai Li, Shenghua Gao2449-2457

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we empirically show that keypoints are important cues to help improve the performance of clothing detection and further design a simple yet effective KGDet model that incorporates keypoint cues into clothing detection; extensive experiments validate the effectiveness of our method as well as the positive correlation between clothing detection and keypoint estimation. The proposed KGDet achieves superior performance on the Deep Fashion2 dataset and FLD dataset with high efficiency.
Researcher Affiliation Collaboration 1Shanghai Tech University 2Alibaba Group 3Ant Group 4Shanghai Engineering Research Center of Intelligent Vision and Imaging
Pseudocode No The paper includes architectural diagrams (Figure 2, Figure 3) but no explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a direct link to a code repository for the methodology described.
Open Datasets Yes We evaluate the proposed method on the Deep Fashion2 (Ge et al. 2019) and Fashion Landmark Detection (FLD) (Liu et al. 2016b) dataset.
Dataset Splits Yes Since only a subset of the dataset is released (192K images for training, 32K for validation, and 63K for test), our experiments are conducted on this publicly available portion. FLD (Liu et al. 2016b) defines 8 keypoints for 3 main types of clothes. There are 83K images for training, 19K for validation, and 19K for test.
Hardware Specification Yes batch size 8 with 4 NVIDIA P40 GPUs
Software Dependencies No The paper mentions "the SGD optimizer is employed to train the whole network" but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup Yes We input images with resolution no larger than 1333 800. We train our network with learning rate 5e 3, momentum 0.9, weight decay 1e 4, batch size 8 with 4 NVIDIA P40 GPUs, and the SGD optimizer is employed to train the whole network. We only use randomly horizontal flip as data augmentation. We empirically set λ1 = 0.1 and λ2 = 1 to balance different loss terms.