SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Authors: Xiaoqi An, Lin Zhao, Chen Gong, Nannan Wang, Di Wang, Jian Yang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the outstanding performance of the proposed method. Specifically, compared to the state-of-the-art method Vi TPose, our model SHa RPose-Base achieves 77.4 AP (+0.5 AP) on the COCO validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of 1.4 faster than Vi TPose-Base. Experiments Experiment Setup Datasets We conduct experiments on COCO (Lin et al. 2014) and MPII (Andriluka et al. 2014) datasets.
Researcher Affiliation Academia Xiaoqi An1,2, Lin Zhao1,2*, Chen Gong1, Nannan Wang2, Di Wang2, Jian Yang1* 1PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education Jiangsu Key Lab of Image and Video Understanding for Social Security School of Computer Science and Engineering, Nanjing University of Science and Technology 2 State Key Laboratory of Integrated Services Networks, Xidian University {xiaoqi.an, linzhao, chen.gong, csjyang}@njust.edu.cn, {nnwang, wangdi}@xidian.edu.cn
Pseudocode No The paper describes the method using text and diagrams, but it does not include a dedicated pseudocode block or algorithm section.
Open Source Code Yes Code is available at https://github.com/Anx Q/sharpose.
Open Datasets Yes We conduct experiments on COCO (Lin et al. 2014) and MPII (Andriluka et al. 2014) datasets.
Dataset Splits Yes We utilize the COCO 2017 dataset, which comprises 200k images and 250k person instances. The dataset is segregated into three subsets: train, valid, and test-dev, containing 150k, 5k, and 20k samples, respectively. We train our model on the train subset and test it on the valid and test-dev subsets.
Hardware Specification Yes To ensure a fair comparison, all experiments presented in this paper are conducted using the MMPose framework (Sense Time 2020) on four NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions using the 'MMPose framework (Sense Time 2020)' but does not provide specific version numbers for MMPose itself or any other software libraries or dependencies used, which is required for reproducibility.
Experiment Setup Yes The model is trained for 210 epochs with a learning rate of 5e-4, which is decreased to 5e-5 and 5e-6 at the 170th and 200th epochs, respectively. In this paper, we instantiate SHa RPose with two different sizes by scaling the embedding size. Other configurations like the depth (the number of Transformer blocks) are set the same. The detailed configurations of the instantiated SHa RPose models are presented in Table.2. We set λ = 0 in the first 180 epochs and λ = 0.03 in the subsequent epochs based on empirical analysis.