Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation
Authors: Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Jie Yang1,2 , Ailing Zeng1 , Shilong Liu1, Feng Li1, Ruimao Zhang2 , Lei Zhang1 1International Digital Economy Academy (IDEA). 2Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen {zengailing,liushilong,lifeng,leizhang}@idea.edu.cn {jieyang5@link, zhangruimao@}cuhk.edu.cn |
| Pseudocode | No | The paper describes methods in text and figures, but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/IDEA-Research/ED-Pose. |
| Open Datasets | Yes | Our experiments are mainly conducted on the popular COCO2017 Keypoint Detection benchmark (Lin et al., 2014), which contains about 250K person instances with 17 keypoints. [...] we also evaluate our approach on the Crowded Pose dataset (Li et al., 2019) which is more challenging and includes many crowded and occlusion scenes. |
| Dataset Splits | Yes | Our experiments are mainly conducted on the popular COCO2017 Keypoint Detection benchmark (Lin et al., 2014), which contains about 250K person instances with 17 keypoints. We compare with other state-of-the-art methods on both the val2017 set and test-dev set. [...] For ablation studies, we report results on the COCO val2017 set. |
| Hardware Specification | Yes | We use the Adam W (Kingma & Ba, 2014; Loshchilov & Hutter, 2017) optimizer with weight decay of 1 10 4 and train our model on Nvidia A100 GPUs with batch size 16 for 60 epochs and 80 epochs on COCO and Crowd Pose, respectively. |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and frameworks (MMdetection) but does not provide specific version numbers for key software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | In the training stage, we augment input images by random crop, random flip, and random resize with the shorter sides in [480, 800] and the longer sides less or equal to 1333 following DETR (Carion et al., 2020) and PETR (Shi et al., 2022). [...] We use the Adam W [...] optimizer with weight decay of 1 10 4 and train our model on Nvidia A100 GPUs with batch size 16 for 60 epochs and 80 epochs on COCO and Crowd Pose, respectively. The initial learning rate is 1 10 4 and is decayed at the 55th epoch and 75th epoch by a factor of 0.1 on COCO and Crowd Pose, respectively. The channel dimension D is set to 256. The number of layers in Human Detection Decoder and Human-to-Keypoint Detection Decoder are 2 and 4 respectively. [...] The loss coefficients µ, β, λ, ω, θ are 5, 2, 2, 10, 4. |