AdaptivePose: Human Parts as Adaptive Points
Authors: Yabo Xiao, Xiao Juan Wang, Dongdong Yu, Guoli Wang, Qian Zhang, Mingshu HE2813-2821
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve the best speed-accuracy trade-offs of 67.4% AP / 29.4 fps with DLA-34 and 71.3% AP / 9.1 fps with HRNet-W48 on COCO test-dev dataset. Experiments and Analysis In this section, we first briefly introduce the dataset, evaluation metric, data augmentation and implementation details. Next, we compare our proposed method with the previous state-of-the-art methods. Finally, we conduct comprehensive ablation study to reveal the effectiveness of each component. |
| Researcher Affiliation | Collaboration | Yabo Xiao,1 Xiao Juan Wang, 1,* Dongdong Yu, 2 Guoli Wang, 3 Qian Zhang, 4 Mingshu He 1 1 Beijing University of Posts and Telecommunications 2 Byte Dance Inc. 3 Tsinghua University 4 Horizon Robotics {xiaoyabo, wj2718, hemingshu}@bupt.edu.cn, yudongdong@bytedance.com, wangguoli1990@mail.tsinghua.edu.cn, qian01.zhang@horizon.ai |
| Pseudocode | No | No explicitly labeled pseudocode or algorithm blocks were found. The method is described textually and with network diagrams. |
| Open Source Code | No | The paper does not provide a direct link to a code repository or an explicit statement about releasing the source code. |
| Open Datasets | Yes | Dataset. The COCO dataset (Lin et al. 2014) consists of over 200,000 images and 250,000 human instances labeled with 17 keypoints for pose estimation task. |
| Dataset Splits | Yes | The COCO dataset (Lin et al. 2014) consists of over 200,000 images and 250,000 human instances labeled with 17 keypoints for pose estimation task. It is divided into train, mini-val, test-dev sets respectively. We train our model on COCO train2017 dataset. The comprehensive experimental results are reported on the COCO mini-val set with 5000 images and test-dev2017 set with 20K images. |
| Hardware Specification | Yes | We train our proposed model via Adam optimizer with a mini-batch size of 64 (8 per GPU) on a workstation with eight 12GB Titan Xp GPUs. The inference time is calculated on a 2080Ti GPU with minibatch 1. |
| Software Dependencies | No | All codes are implemented with Pytorch. (No version specified for Pytorch). |
| Experiment Setup | Yes | During training, we use random flip, random rotation, random scaling and color jitter to augment training samples. The flip probability is 0.5, the rotation range is (-30, 30) and the scale range is (0.6, 1.3). Each input image is cropped according to the random center and random scale then resized to 512 / 640 pixels for DLA-34 (Yu et al. 2018) and 800 pixels for HRNet-W48 (Sun et al. 2019). The output size is 1/4 of the input resolution. Implementation Details. We train our proposed model via Adam optimizer with a mini-batch size of 64 (8 per GPU) on a workstation with eight 12GB Titan Xp GPUs. We use initial learning rate of 2.5e-4. All codes are implemented with Pytorch. All ablation studies adopt DLA-34 as backbone and use the 1x training epoch (140 epochs) with single-scale testing on the COCO mini-val set. |