AdaptivePose: Human Parts as Adaptive Points

Authors: Yabo Xiao, Xiao Juan Wang, Dongdong Yu, Guoli Wang, Qian Zhang, Mingshu HE2813-2821

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve the best speed-accuracy trade-offs of 67.4% AP / 29.4 fps with DLA-34 and 71.3% AP / 9.1 fps with HRNet-W48 on COCO test-dev dataset. Experiments and Analysis In this section, we first briefly introduce the dataset, evaluation metric, data augmentation and implementation details. Next, we compare our proposed method with the previous state-of-the-art methods. Finally, we conduct comprehensive ablation study to reveal the effectiveness of each component.
Researcher Affiliation Collaboration Yabo Xiao,1 Xiao Juan Wang, 1,* Dongdong Yu, 2 Guoli Wang, 3 Qian Zhang, 4 Mingshu He 1 1 Beijing University of Posts and Telecommunications 2 Byte Dance Inc. 3 Tsinghua University 4 Horizon Robotics {xiaoyabo, wj2718, hemingshu}@bupt.edu.cn, yudongdong@bytedance.com, wangguoli1990@mail.tsinghua.edu.cn, qian01.zhang@horizon.ai
Pseudocode No No explicitly labeled pseudocode or algorithm blocks were found. The method is described textually and with network diagrams.
Open Source Code No The paper does not provide a direct link to a code repository or an explicit statement about releasing the source code.
Open Datasets Yes Dataset. The COCO dataset (Lin et al. 2014) consists of over 200,000 images and 250,000 human instances labeled with 17 keypoints for pose estimation task.
Dataset Splits Yes The COCO dataset (Lin et al. 2014) consists of over 200,000 images and 250,000 human instances labeled with 17 keypoints for pose estimation task. It is divided into train, mini-val, test-dev sets respectively. We train our model on COCO train2017 dataset. The comprehensive experimental results are reported on the COCO mini-val set with 5000 images and test-dev2017 set with 20K images.
Hardware Specification Yes We train our proposed model via Adam optimizer with a mini-batch size of 64 (8 per GPU) on a workstation with eight 12GB Titan Xp GPUs. The inference time is calculated on a 2080Ti GPU with minibatch 1.
Software Dependencies No All codes are implemented with Pytorch. (No version specified for Pytorch).
Experiment Setup Yes During training, we use random flip, random rotation, random scaling and color jitter to augment training samples. The flip probability is 0.5, the rotation range is (-30, 30) and the scale range is (0.6, 1.3). Each input image is cropped according to the random center and random scale then resized to 512 / 640 pixels for DLA-34 (Yu et al. 2018) and 800 pixels for HRNet-W48 (Sun et al. 2019). The output size is 1/4 of the input resolution. Implementation Details. We train our proposed model via Adam optimizer with a mini-batch size of 64 (8 per GPU) on a workstation with eight 12GB Titan Xp GPUs. We use initial learning rate of 2.5e-4. All codes are implemented with Pytorch. All ablation studies adopt DLA-34 as backbone and use the 1x training epoch (140 epochs) with single-scale testing on the COCO mini-val set.