Continuous Heatmap Regression for Pose Estimation via Implicit Neural Representation
Authors: Shengxiang Hu, Huaijiang Sun, Dong Wei, Xiaoning Sun, Jin Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on three pose estimation benchmarks: COCO [25], MPII [1], and Crowd Pose [21]. The results show that Ner PE significantly enhances existing heatmap-based methods and obtains superior performance on low-resolution input images. |
| Researcher Affiliation | Academia | 1Nanjing University of Science and Technology, Nanjing, China 2Nantong University, Nantong, China |
| Pseudocode | No | The paper does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/hushengxiang/Ner PE. |
| Open Datasets | Yes | We conduct extensive experiments on three pose estimation benchmarks: COCO [25], MPII [1], and Crowd Pose [21]. |
| Dataset Splits | Yes | Evaluation on COCO. To evaluate the value of continuous heatmap representation for human pose estimation (HPE), we perform Ner PE with three backbones [16, 42, 24] at three input resolutions on the COCO validation set, as shown in Table 1. |
| Hardware Specification | No | The paper mentions "All our experiments are conducted on an open-source machine learning, Py Torch [35]" and reports GFLOPS (Figure 3), but it does not specify concrete hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper states "All our experiments are conducted on an open-source machine learning, Py Torch [35]" but does not specify the version number of PyTorch or any other software dependencies. |
| Experiment Setup | Yes | In the main experimental results, the training settings of Ner PE is consistent with the comparison methods [48, 42, 24] based on discrete heatmap regression. We use the Adam optimizer [18] for training, in which the learning rate is initialized to 1e 3 and decreased to 1e 4 and 1e 5. The data augmentation used includes random rotation, random scale, image flipping, and half body cropping. |