$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
Authors: Weiquan Wang, Jun Xiao, Chunping Wang, Wei Liu, Zhao Wang, Long Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations conducted on various benchmarks (e.g., Human3.6M, 3DPW, and 3DPW-Occ) have demonstrated its effectiveness. |
| Researcher Affiliation | Collaboration | Weiquan Wang1, Jun Xiao1, Chunping Wang2, Wei Liu3, Zhao Wang1, Long Chen4 1Zhejiang University 2Finvolution Group 3Tencent 4Hong Kong University of Science and Technology |
| Pseudocode | Yes | In this section, we provide complete training and inference algorithms for discrete diffusion process. Algorithm 1 Training Algorithm for the discrete diffusion process. Algorithm 2 Inference Algorithm for the discrete diffusion process. |
| Open Source Code | No | We will release code upon paper acceptance. |
| Open Datasets | Yes | Human3.6M [34] is the most extensive benchmark for 3D HPE... 3DPW [72] is the first dataset... Additionally, to further verify the occlusion-robustness, we evaluate Di2Pose on the 3DPW-Occ [83], which is a subset of the 3DPW. |
| Dataset Splits | Yes | We follow [22] with same protocol, which involves training on subjects S1, S5, S6, S7, and S8, and testing on subjects S9 and S11. |
| Hardware Specification | Yes | All experiments are carried out on one NVIDIA A100 PCIe GPU. |
| Software Dependencies | No | The proposed Di2Pose is completely implemented in Py Torch [53]. However, no specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | Pose Quantization Step. The pose encoder is constructed with four Local-MLP blocks, while the pose decoder incorporates a single block. Within these Local-MLP blocks, the embedding dimensions D for the pose encoder and decoder are configured to 2048 and 512, respectively. For the quantization process, the projected vector qi features the channel d = 5. The levels per channel, denoted as [L1, , Ld], are specified as [7, 5, 5, 5, 5]. The number of quantized tokens N is set to 100. Discrete Diffusion Process. For the occlude and replace transition matrix, we linearly increase βs and γs from 0 to 0.1 and 0.9, respectively, and decrease αs from 1 to 0. For the discrete diffusion model, we use off-the-shelf image encoder [79] to extract feature sequence of conditional 2D image. As for the pose denoiser, we build a 21-layer 16-head transformer with the dimension of 1024. We set steps S as 100 and loss weight λ is set to 5e-4. |