Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation

Authors: Xingyu Liu, Pengfei Ren, Yuanyuan Gao, Jingyu Wang, Haifeng Sun, Qi Qi, Zirui Zhuang, Jianxin Liao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method achieves leading performance on two challenging datasets, Dex YCB (Chao et al. 2021) and HO-3D (Hampali et al. 2020), significantly outperforming previous state-of-the-art (SOTA) methods. We evaluate our method using the metric of Mean Per Joint Position Error (MPJPE). Comparisons with State-of-the-arts We compare the performance with SOTA 3D hand pose estimation methods on Dex YCB.
Researcher Affiliation Academia State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications {liuxingyu, rpf, gaoyuanyuan, wangjingyu, hfsun, qiqi8266, zhuangzirui, liaojx}@bupt.edu.cn
Pseudocode No The paper describes the proposed method in prose and through diagrams (Figure 2, Figure 3), but it does not include any explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code Yes Code is available at https://github.com/ru1ven/Keypoint Fusion.
Open Datasets Yes Dex YCB dataset Dex YCB is a hand-object dataset captured by multiple RGB-D cameras, containing 582K RGBD frames over 1,000 sequences of 10 subjects grasping 20 different objects from 8 views. Dex YCB has four official dataset splits of train/val/test, namely S0, S1, S2, and S3, split by the sequences, subjects, views, and objects, respectively. We conduct performance comparisons on all four splits and use the default S0 split in ablation studies. HO-3D dataset HO-3D is an RGB-D hand-object interaction dataset, containing 66,034 training images and 11,524 test images from a total of 68 sequences.
Dataset Splits Yes Dex YCB has four official dataset splits of train/val/test, namely S0, S1, S2, and S3, split by the sequences, subjects, views, and objects, respectively. We conduct performance comparisons on all four splits and use the default S0 split in ablation studies.
Hardware Specification Yes Our experiments are conducted with an NVIDIA RTX 4090 GPU.
Software Dependencies No The paper states 'The network is implemented based on Py Torch (Paszke et al. 2019).', but it does not specify the version number of PyTorch or any other software dependencies crucial for replication (e.g., Python version, CUDA version, or other libraries with their specific versions).
Experiment Setup Yes We use an Adam W optimizer (Kulon et al. 2019) with an initial learning rate of 8e-4. The whole training process takes 15 and 25 epochs on Dex YCB and HO-3D, respectively. For data augmentation, we crop the input RGB-D images to the size of 128 128, and perform random rotation [-180, 180], random scaling [0.9, 1.1], and random translating [-10, 10].