Direct Multi-view Multi-person 3D Pose Estimation
Authors: tao wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that our Mv P model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92.3% AP25 on the challenging Panoptic dataset, improving upon the previous best approach [40] by 9.8%. Comprehensive experiments on 3D pose benchmarks Panoptic [19], as well as Shelf and Campus [1] demonstrate our Mv P works very well. |
| Researcher Affiliation | Collaboration | Tao Wang1,2 , Jianfeng Zhang2 , Yujun Cai1, Shuicheng Yan1, Jiashi Feng1, 1Sea AI Lab 2National University of Singapore, twangnh@gmail.com, zhangjianfeng@u.nus.edu, {caiyj,yansc,fengjs}@sea.com |
| Pseudocode | No | The paper describes the model architecture and training process in text and diagrams, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models are available at https://github.com/sail-sg/mvp. |
| Open Datasets | Yes | Datasets Panoptic [20] is a large-scale benchmark with 3D skeleton joint annotations. Shelf and Campus [1] are two multi-person datasets capturing indoor and outdoor environments, respectively. |
| Dataset Splits | No | The paper mentions splitting data into training and testing sets ('Following Voxel Pose [40], we use the same data sequences except 160906_band3 in the training set due to broken images.' and 'We split them into training and testing sets following [1, 6, 40].') but does not explicitly define a separate validation dataset split with specific percentages or counts. |
| Hardware Specification | Yes | GPU: Ge Force RTX 2080 Ti CPU: i7-6900K @ 3.20GHz. For all methods, the time is counted on GPU Ge Force RTX 2080 Ti and CPU Intel i7-6900K @ 3.20GHz. |
| Software Dependencies | No | The paper mentions using Adam optimizer and building upon ResNet-50 for feature extraction, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The model is trained for 40 epochs, with the Adam optimizer of learning rate 10 4. During inference, a confidence threshold of 0.1 is used to filter out redundant predictions. Please refer to supplementary for more implementation details. ... Unless otherwise stated, we use a stack of six transformer decoder layers. |