Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference
Authors: Dongkai Wang, Shiliang Zhang, Gang Hua
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several crowded scenes pose estimation benchmarks demonstrate the superiority of PINet. For instance, it achieves 59.8% AP on the OCHuman dataset, outperforming the recent works by a large margin . |
| Researcher Affiliation | Collaboration | Dongkai Wang Peking University dongkai.wang@pku.edu.cn Shiliang Zhang Peking University slzhang.jdl@pku.edu.cn Gang Hua Wormpex AI Research ganghua@gmail.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/kennethwdk/PINet |
| Open Datasets | Yes | OCHuman [31] is a recently proposed benchmark... Crowd Pose [10] is another dataset... COCO [12] is the popular MPPE benchmark... |
| Dataset Splits | Yes | OCHuman... including 2500 images for validation and 2231 images for testing. Following [22], we train models on val set and report the performance on test set. Crowd Pose... It contains 10K, 2K and 8K images for train, val and test set. COCO... The train set includes 57K images and 150K person instances... the val set contains 5K images, and the test-dev set consists of 20K images. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch [19]' but does not provide a specific version number for it or any other key software dependencies. |
| Experiment Setup | Yes | The input image is resized to 512*512. We use Adam [8] to optimize the model, the learning rate is set to 0.001 for all layers. We train the model for 140 epochs on OCHuman and COCO, with learning rate dividing by 10 at 90th, 120th epoch. For Crowd Pose, we train model with 300 epochs and divide learning rate by 10 at 200th, 260th epoch. The batch size is set to 20 OCHuman and 40 for Crowd Pose and COCO. We adopt data augmentation strategies including random rotation (-30,30), scale ([0.75,1.5]), translation ([-40,40]) and flipping (0.5). |