Direct Multi-view Multi-person 3D Pose Estimation

Authors: tao wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show experimentally that our Mv P model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92.3% AP25 on the challenging Panoptic dataset, improving upon the previous best approach [40] by 9.8%. Comprehensive experiments on 3D pose benchmarks Panoptic [19], as well as Shelf and Campus [1] demonstrate our Mv P works very well.
Researcher Affiliation Collaboration Tao Wang1,2 , Jianfeng Zhang2 , Yujun Cai1, Shuicheng Yan1, Jiashi Feng1, 1Sea AI Lab 2National University of Singapore, twangnh@gmail.com, zhangjianfeng@u.nus.edu, {caiyj,yansc,fengjs}@sea.com
Pseudocode No The paper describes the model architecture and training process in text and diagrams, but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Code and models are available at https://github.com/sail-sg/mvp.
Open Datasets Yes Datasets Panoptic [20] is a large-scale benchmark with 3D skeleton joint annotations. Shelf and Campus [1] are two multi-person datasets capturing indoor and outdoor environments, respectively.
Dataset Splits No The paper mentions splitting data into training and testing sets ('Following Voxel Pose [40], we use the same data sequences except 160906_band3 in the training set due to broken images.' and 'We split them into training and testing sets following [1, 6, 40].') but does not explicitly define a separate validation dataset split with specific percentages or counts.
Hardware Specification Yes GPU: Ge Force RTX 2080 Ti CPU: i7-6900K @ 3.20GHz. For all methods, the time is counted on GPU Ge Force RTX 2080 Ti and CPU Intel i7-6900K @ 3.20GHz.
Software Dependencies No The paper mentions using Adam optimizer and building upon ResNet-50 for feature extraction, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes The model is trained for 40 epochs, with the Adam optimizer of learning rate 10 4. During inference, a confidence threshold of 0.1 is used to filter out redundant predictions. Please refer to supplementary for more implementation details. ... Unless otherwise stated, we use a stack of six transformer decoder layers.