Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human Pose Estimation
Authors: Qun Li, Ziyi Zhang, Fu Xiao, Feng Zhang, Bir Bhanu
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed network achieves superior performance on both COCO and MPII human pose estimation datasets, surpassing the stateof-the-art lightweight networks. |
| Researcher Affiliation | Academia | 1School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China 2Department of Electrical and Computer Engineering, University of California at Riverside, CA, USA |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/Ziyi Zhang27/Dite-HRNet. |
| Open Datasets | Yes | The COCO dataset [Lin et al., 2014] has images over 200K and 250K person instances, each with a label of 17 keypoints. We train our networks on the train2017 set (contains 57K images and 150K person instances)... To further validate our networks, we also perform experiments on the MPII Human Pose dataset [Andriluka et al., 2014], which contains about 25K images with 40K person instances... |
| Dataset Splits | Yes | We train our networks on the train2017 set (contains 57K images and 150K person instances), and evaluate them on the val2017 set (contains 5K images) and test-dev2017 set (contains 20K images) by the Average Precision (AP) and Average Recall (AR) scores based on Object Keypoint Similarity (OKS). |
| Hardware Specification | Yes | The presented Dite-HRNet is trained on 8 Ge Force RTX 3090 GPUs, with 32 samples per GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' but does not specify their version numbers or other crucial software dependencies for replication. |
| Experiment Setup | Yes | All parameters are updated by Adam optimizer with a base learning rate 2e 3. As for the data processing, we expand all human detection boxes to a fixed aspect ratio 4 : 3, and then crop the images with the detection boxes, which are resized to 256 192 or 384 288 for the COCO dataset, and 256 256 for the MPII dataset. All images are used with data augmentations, including random rotations with factor 30, random scales with factor 0.25, and random flippings for both the COCO and MPII datasets. |