Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

Authors: Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS
Researcher Affiliation Collaboration Jie Yang1,2 , Ailing Zeng1 , Shilong Liu1, Feng Li1, Ruimao Zhang2 , Lei Zhang1 1International Digital Economy Academy (IDEA). 2Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen {zengailing,liushilong,lifeng,leizhang}@idea.edu.cn {jieyang5@link, zhangruimao@}cuhk.edu.cn
Pseudocode No The paper describes methods in text and figures, but does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/IDEA-Research/ED-Pose.
Open Datasets Yes Our experiments are mainly conducted on the popular COCO2017 Keypoint Detection benchmark (Lin et al., 2014), which contains about 250K person instances with 17 keypoints. [...] we also evaluate our approach on the Crowded Pose dataset (Li et al., 2019) which is more challenging and includes many crowded and occlusion scenes.
Dataset Splits Yes Our experiments are mainly conducted on the popular COCO2017 Keypoint Detection benchmark (Lin et al., 2014), which contains about 250K person instances with 17 keypoints. We compare with other state-of-the-art methods on both the val2017 set and test-dev set. [...] For ablation studies, we report results on the COCO val2017 set.
Hardware Specification Yes We use the Adam W (Kingma & Ba, 2014; Loshchilov & Hutter, 2017) optimizer with weight decay of 1 10 4 and train our model on Nvidia A100 GPUs with batch size 16 for 60 epochs and 80 epochs on COCO and Crowd Pose, respectively.
Software Dependencies No The paper mentions optimizers (Adam W) and frameworks (MMdetection) but does not provide specific version numbers for key software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes In the training stage, we augment input images by random crop, random flip, and random resize with the shorter sides in [480, 800] and the longer sides less or equal to 1333 following DETR (Carion et al., 2020) and PETR (Shi et al., 2022). [...] We use the Adam W [...] optimizer with weight decay of 1 10 4 and train our model on Nvidia A100 GPUs with batch size 16 for 60 epochs and 80 epochs on COCO and Crowd Pose, respectively. The initial learning rate is 1 10 4 and is decayed at the 55th epoch and 75th epoch by a factor of 0.1 on COCO and Crowd Pose, respectively. The channel dimension D is set to 256. The number of layers in Human Detection Decoder and Human-to-Keypoint Detection Decoder are 2 and 4 respectively. [...] The loss coefficients µ, β, λ, ω, θ are 5, 2, 2, 10, 4.