Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human Pose Estimation

Authors: Qun Li, Ziyi Zhang, Fu Xiao, Feng Zhang, Bir Bhanu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the proposed network achieves superior performance on both COCO and MPII human pose estimation datasets, surpassing the stateof-the-art lightweight networks.
Researcher Affiliation Academia 1School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China 2Department of Electrical and Computer Engineering, University of California at Riverside, CA, USA
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at: https://github.com/Ziyi Zhang27/Dite-HRNet.
Open Datasets Yes The COCO dataset [Lin et al., 2014] has images over 200K and 250K person instances, each with a label of 17 keypoints. We train our networks on the train2017 set (contains 57K images and 150K person instances)... To further validate our networks, we also perform experiments on the MPII Human Pose dataset [Andriluka et al., 2014], which contains about 25K images with 40K person instances...
Dataset Splits Yes We train our networks on the train2017 set (contains 57K images and 150K person instances), and evaluate them on the val2017 set (contains 5K images) and test-dev2017 set (contains 20K images) by the Average Precision (AP) and Average Recall (AR) scores based on Object Keypoint Similarity (OKS).
Hardware Specification Yes The presented Dite-HRNet is trained on 8 Ge Force RTX 3090 GPUs, with 32 samples per GPU.
Software Dependencies No The paper mentions software components like 'Adam optimizer' but does not specify their version numbers or other crucial software dependencies for replication.
Experiment Setup Yes All parameters are updated by Adam optimizer with a base learning rate 2e 3. As for the data processing, we expand all human detection boxes to a fixed aspect ratio 4 : 3, and then crop the images with the detection boxes, which are resized to 256 192 or 384 288 for the COCO dataset, and 256 256 for the MPII dataset. All images are used with data augmentations, including random rotations with factor 30, random scales with factor 0.25, and random flippings for both the COCO and MPII datasets.