Deep Semantic Graph Transformer for Multi-View 3D Human Pose Estimation
Authors: Lijun Zhang, Kangkang Zhou, Feng Lu, Xiang-Dong Zhou, Yu Shi
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three 3D HPE benchmarks show that our method achieves state-of-the-art results. |
| Researcher Affiliation | Academia | 1Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China 2Chongqing School, University of Chinese Academy of Sciences, Chongqing, China 3Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China 4 Peng Cheng Laboratory, Shenzhen, China {zhanglijun, zhouxiangdong, shiyu}@cigit.ac.cn, zhoukangkang21@mails.ucas.ac.cn, lf22@mails.tsinghua.edu.cn |
| Pseudocode | No | The paper provides architectural diagrams (e.g., Figure 1, Figure 2) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes and models are available at https://github.com/z0911k/SGraFormer. |
| Open Datasets | Yes | Human3.6M. (Ionescu et al. 2013) is the largest and most popular 3D HPE benchmark. ... MPI-INF-3DHP. (Mehta et al. 2017) is a large-scale 3D human pose dataset... ... Ski-Pose PTZ-Camera. (Fasel et al. 2016) is a smaller dataset... |
| Dataset Splits | Yes | Following previous works (Zheng et al. 2021; Li et al. 2022b, 2023), we use subjects S1, S5, S6, S7, and S8 for model training, S9 and S11 for model testing. ... Four chest views of S1-S6 are used for training, S7 and S8 are for testing. ... Following the official implementations, we use the subject 1-5 for model training, and subject 6 for model testing. |
| Hardware Specification | Yes | Our experiments are conducted on the Py Torch platform with 4 Ge Force RTX 1080Ti GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch platform' but does not specify its version or the versions of any other software dependencies, making it not reproducible based on software versions. |
| Experiment Setup | Yes | The Amsgrad optimizer is used with a weight decay of 0.1. For model training, the initial learning rate is 0.0002. The learning shrink factor after each epoch is α = 0.98. When training the model, we set the maximum epoch and batch size to 50 and 1024, respectively. Four-order global-to-local spatial embedding graph features are considered. Four cascaded spatial and temporal transformer encoder layers are used in our framework, respectively. When using the detected 2D pose to obtain the 3D pose, we adopt the Cascaded Pyramid Network (CPN) (Chen et al. 2018) as the 2D pose detector. |