reproducibilityindex.ai

Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation

Authors: Xingyu Liu, Pengfei Ren, Yuanyuan Gao, Jingyu Wang, Haifeng Sun, Qi Qi, Zirui Zhuang, Jianxin Liao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our method achieves leading performance on two challenging datasets, Dex YCB (Chao et al. 2021) and HO-3D (Hampali et al. 2020), significantly outperforming previous state-of-the-art (SOTA) methods. We evaluate our method using the metric of Mean Per Joint Position Error (MPJPE). Comparisons with State-of-the-arts We compare the performance with SOTA 3D hand pose estimation methods on Dex YCB.
Researcher Affiliation	Academia	State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications {liuxingyu, rpf, gaoyuanyuan, wangjingyu, hfsun, qiqi8266, zhuangzirui, liaojx}@bupt.edu.cn
Pseudocode	No	The paper describes the proposed method in prose and through diagrams (Figure 2, Figure 3), but it does not include any explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code	Yes	Code is available at https://github.com/ru1ven/Keypoint Fusion.
Open Datasets	Yes	Dex YCB dataset Dex YCB is a hand-object dataset captured by multiple RGB-D cameras, containing 582K RGBD frames over 1,000 sequences of 10 subjects grasping 20 different objects from 8 views. Dex YCB has four official dataset splits of train/val/test, namely S0, S1, S2, and S3, split by the sequences, subjects, views, and objects, respectively. We conduct performance comparisons on all four splits and use the default S0 split in ablation studies. HO-3D dataset HO-3D is an RGB-D hand-object interaction dataset, containing 66,034 training images and 11,524 test images from a total of 68 sequences.
Dataset Splits	Yes	Dex YCB has four official dataset splits of train/val/test, namely S0, S1, S2, and S3, split by the sequences, subjects, views, and objects, respectively. We conduct performance comparisons on all four splits and use the default S0 split in ablation studies.
Hardware Specification	Yes	Our experiments are conducted with an NVIDIA RTX 4090 GPU.
Software Dependencies	No	The paper states 'The network is implemented based on Py Torch (Paszke et al. 2019).', but it does not specify the version number of PyTorch or any other software dependencies crucial for replication (e.g., Python version, CUDA version, or other libraries with their specific versions).
Experiment Setup	Yes	We use an Adam W optimizer (Kulon et al. 2019) with an initial learning rate of 8e-4. The whole training process takes 15 and 25 epochs on Dex YCB and HO-3D, respectively. For data augmentation, we crop the input RGB-D images to the size of 128 128, and perform random rotation [-180, 180], random scaling [0.9, 1.1], and random translating [-10, 10].